Home Laboratory Computer Lab Theses

3 Calculus of variations

3.1 Introduction

We are concerned with the problem of finding minima (maxima) of a functional F : , where is a subset of a (normed) linear space 𝒳 of real-valued (or real-vector-valued) functions. The formulation of a problem of the calculus of variations requires two steps: the specification of a performance criterion is discussed and the statement of physical constraints that should be satisfied.

Remark 3.1: Performance Criterions

A performance criterion F , also called cost functional or simply cost must be specified for evaluating the performance of a system quantitatively. The typical form of F (also called Lagrange problem of the calculus of variations) is

[ x ] : = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t

where t is the real or independent variable, usually called time, x ( t ) n x , n x 1 , is a real vector variable, usually called the phase variable. The functions x ( t ) = ( x 1 , , x n x ) t 1 t t 2 are generally called trajectories or curves; x ˙ ( t ) n x stands for the derivative of x ( t ) with respect to time; and f : × n x × n x is a real-valued function, called a Lagrangian function or, briefly, a Lagrangian a , the function f measures the instantaneous cost and is sometimes called running cost. Overall, we may thus think of ( x ) as dependent on an real-vector-valued continuous function x ( t ) 𝒳 . A slightly different performance criterion is:

F [ x ] = φ ( t 1 , x ( t 1 ) , t 2 , x ( t 2 ) ) + t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t

an additional function φ weights the endpoints of the phase trajectory { t 1 , x ( t 1 ) } and { t 2 , x ( t 2 ) } . This problem is called Bolza problem. In the same way one can define a third type of cost functional known as Mayer problem b :

F [ x ] = ψ ( t 1 , x ( t 1 ) , t 2 , x ( t 2 ) )

It can be shown that all these formulations are equivalent that means that can be transformed into one another via a variable transformation.

Definition 3.1: Calculus of Variation Basic Problem

The basic problem of Calculus of Variation is defined as:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t s . t . x 𝒳

For example we can take as 𝒳 = C 1 ( [ t 1 , t 2 ] ) the space of real continuously differentiable functions on [ t 1 , t 2 ] and as a subset of this function space : = { x 𝒳 such that x ( t 1 ) = x 1 } that is the space of continuously differentiable functions that is equal to a fixed x 1 at t 1 . Enforcing constraints in the optimization problem reduces the set of candidate functions, i.e., not all functions in 𝒳 are allowed. This leads to the following:

Definition 3.2: Admissible Trajectory

A trajectory x a a Again we emphasize the fact that a trajectory x 𝒳 is a real or vector valued function, we use both the notation x or x ( t ) in a real linear function space 𝒳 is said to be an admissible trajectory provided that it satisfies all the physical constraints (if any) along the interval [ t 1 , t 2 ] . The set 𝒟 of admissible trajectories is defined as

𝒟 : = { x 𝒳 : x  admissible  }

Typically, the functions x ( t ) are required to satisfy conditions at their end-points. Problems of the calculus of variations having end-point constraints only, are often referred to as free problems of the calculus of variations. A great deal of boundary conditions is of interest. The simplest one is to enforce both end-points fixed, e.g., x ( t 1 ) = x 1 and x ( t 2 ) = x 2 . Then, the set of admissible trajectories can be defined as:

𝒟 : = { x 𝒳 : x ( t 1 ) = x 1 , x ( t 2 ) = x 2 } .

In this case, we may say that we seek for trajectories x ( t ) 𝒳 joining the fixed points ( t 1 , x 1 ) and ( t 2 , x 2 ) . Alternatively, we may require that the trajectory x ( t ) 𝒳 join a fixed point ( t 1 , x 1 ) to a specified curve Γ : x = g ( t ) , t 1 t T . Because the final time t 2 is now free, not only the optimal trajectory x ( t ) shall be determined, but also the optimal value of t 2 . That is, the set of admissible trajectories is now defined as

𝒟 : = { ( x , t 2 ) 𝒳 × [ t 1 , T ] : x ( t 1 ) = x 1 , x ( t 2 ) = g ( t 2 ) }

Besides bound constraints, another type of constraints is often required:

F k [ x ] : = t 1 t 2 f k ( t , x ( t ) , x ˙ ( t ) ) d t = C k k = 1 , , m

for m 1 lagrangian functionals F 1 , , F m with lagrangian functions f 1 , , f m . These constraints are often referred to as isoperimetric constraints. Similar constraints with sign are sometimes called comparison functionals. Finally, further restriction may be necessary in practice, such as requiring that x ( t ) 0 for all or part of the optimization horizon [ t 1 , t 2 ] . More generally, constraints of the form

Ψ ( t , x ( t ) , x ˙ ( t ) ) 0 Φ ( t , x ( t ) , x ˙ ( t ) ) = 0

for t in some interval I [ t 1 , t 2 ] are called constraints of the Lagrangian form, or path constraints. A discussion of problems having path constraints is deferred until the following chapter on optimal control. We now give a series of examples of classical problems of the Calculus of Variations whose admissible function spaces present some of the constraints discussed previously. The most famous and maybe first problem of the calculus of variations is the Brachistochrone Problem below.

Example 3.1 (Brachistochrone Problem). Consider the problem of finding the curve x ( ξ ) , ξ A ξ ξ B , in the vertical plane ( ξ , x ) , joining given points A = ( ξ A , x A ) and B = ( ξ B , x B ) , ξ A < ξ B , x A < x B , and such that a material point sliding along x ( ξ ) without friction from A to B , under gravity and with initial speed v A 0 , reaches B in a minimal time (see Figure 3.1 ). This problem was first formulated then solved by Johann Bernoulli, more than 300 years ago!

The objective function F [ x ] is the time required for the point to travel from A to B along the curve x ( ξ )

F [ x ] = ξ A ξ B d t = ξ A ξ B d s v ( ξ )

where s denotes the arc length of x ( ξ ) , defined by d s = 1 + ( ξ ) 2 d ξ , and v , the velocity along x ( ξ ) . Since the point is sliding along x ( ξ ) without friction, energy is conserved,

1 2 m ( v ( ξ ) 2 v A 2 ) + m g ( x ( ξ ) x A ) = 0

with m being the mass of the point, and g , the gravity acceleration. That is, v ( ξ ) = v A 2 2 g ( x ( ξ ) x A ) , and

F [ x ] = ξ A ξ B 1 + ( ξ ) 2 v A 2 2 g ( x ( ξ ) x A ) d ξ

The Brachistochrone problem thus formulates as a Lagrange problem of the calculus of variations.

PIC

Figure 3.1:: The Brachistochrone Problem

We note that this problem is a free problem since we have assigned endopoints A = ( ξ A , x A ) and B = ( ξ B , x B ) were given. Another classical problem of the calculus of variations with Lagrange cost functional is the catenary problem.

Example 3.2 (Hanging rope with fixed length). Given a rope of length L attached at two poles A = ( x A , y A ) and B = ( x B , y B ) find the function y ( x ) that describes the shape assumed by the rope under the action of gravity. Using the infinitesimal arc length d s we can express the constraint enforcing the length L of the rope as an isoperimetric constraint. That is:

F 1 [ y ] = x A x B d s = x A x B 1 + 2 d x = L 1 1 Note that in this case  x  is the independent variable and  = d y d x  

Therefore the admissible space of functions 𝒟 is a subset of a general linear function space 𝒳 that is defined as:

𝒟 : = { y 𝒳 : y ( x A ) = y A , y ( x B ) = y B , F 1 [ y ] = L }

Since we are seeking for a static equilibrium condition, we look for a function y that minimzes the potential energy. In the continuous setting the potential energy of the rope can be expressed as:

F [ x ] = V [ x ] = M y ( x ) g d m = x A x B y g ρ d s = ρ g x A x B y 1 + 2 d x

Therefore the catenary problem is:

min y ( x ) F [ y ] = ρ g x A x B y 1 + 2 d x s . t . y 𝒟

pict

Figure 3.2:: The Catenary Problem

A similar problem to the catenary one is Dido’s isoperimetric problem that, according to Virgil’s Aeneid, dates back to the foundation of Carthago around 850 B.C. Dido’s problem is to find the arc that given the endpoints and the length is able to maximize the area.

Example 3.3 (Dido’s Problem). Given a length L , find the curve of length L with fixed endpoints that maximize the area between the curve and the x-axis. The isoperimetric constraint is again:

G [ y ] = x A x B d s = x A x B 1 + 2 d x = L

While the cost functional to maximize is:

F [ y ] = x A x B y d x

3.2 Class of functions and optimality criteria

Having defined an objective functional F [ x ] and the set of admissible trajectories x ( t ) 𝒟 𝒳 , one must then decide about the class of functions with respect to which the optimization shall be performed. The traditional choice in the calculus of variations is to consider the class of continuously differentiable functions, e.g., 𝒞 1 [ t 1 , t 2 ] . Yet, as shall be seen later on, an optimal solution may not exist in this class. In response to this, a more general class of functions shall be considered, such as the class 𝒞 ^ 1 [ t 1 , t 2 ] of piecewise continuously differentiable functions.

At this point, we need to define what is meant by a minimum (or a maximum) of F [ x ] on 𝒟 . Similar to finite-dimensional optimization problems we shall say that F assumes its minimum value at x provided that

F [ x ] F [ x ] x 𝒟 .

This assignment is global in nature and may be made without consideration of a norm (or, more generally, a distance). Yet, the specification of a norm permits an analogous description of the local behavior of F [ x ] at a point x 𝒟 . In particular, x is said to be a local minimum for F [ x ] in 𝒟 , relative to the norm , if

δ > 0  such that  F [ x ] F [ x ] , x δ ( x ) 𝒟

with δ ( x ) : = { x 𝒳 : x x < δ } . Unlike finite-dimensional linear spaces, different norms are not necessarily equivalent in infinite-dimensional linear spaces, in the sense that x may be a local minimum with respect to one norm but not with respect to another.

Having chosen the class of functions of interest as 𝒞 1 [ t 1 , t 2 ] , several norms can be used. Maybe the most natural choice for a norm on 𝒞 1 [ t 1 , t 2 ] is

x 1 , : = max a t b | x ( t ) | + max a t b | ( t ) |

since 𝒞 1 [ t 1 , t 2 ] endowed with 1 , is a Banach space. Another choice is to endow 𝒞 1 [ t 1 , t 2 ] with the maximum norm of continuous functions:

x : = max a t b | x ( t ) | .

The maximum norms, and 1 , are called the strong norm and the weak norm, respectively. Similarly, we shall endow 𝒞 1 [ t 1 , t 2 ] n x with the strong norm and the weak norm 1 , :

x : = max a t b x ( t ) x 1 , : = max a t b x ( t ) + max a t b x ˙ ( t )

where x ( t ) stands for any norm in n x . The strong and weak norms lead to the following definitions for a local minimum:

Definition 3.3: Strong Local Minimum, Weak Local Minimum

x 𝒟 is said to be a strong local minimum for F [ x ] in 𝒟 if

δ > 0  such that  F [ x ] F [ x ] , x δ ( x ) 𝒟 .

Likewise, x 𝒟 is said to be a weak local minimum for F [ x ] in 𝒟 if

δ > 0  such that  F [ x ] F [ x ] , x δ 1 , ( x ) 𝒟 .

Remark 3.2

Note that for strong (local) minima, we completely disregard the values of the derivatives of the comparison elements x 𝒟 . That is, a neighborhood associated to the topology induced by has many more curves than in the topology induced by 1 , . In other words, a weak minimum may not necessarily be a strong minimum. Consider the three curves y 0 , y 1 , y 2 illustrated in Figure 3.3. y 0 and y 2 are close in the strong sense that is according to but not in a weak sense. While y 0 and y 1 are close both according to and 1 , . a a Note that actually to take the of y 2 we should make it 𝒞 1 hence smoothing the corners

PIC

Figure 3.3:: Closeness in weak and strong sense

These important considerations are illustrated in the following example.

Example 3.4 (A Weak Minimum that is Not a Strong One). Consider the problem P to minimize the functional

F [ x ] : = 0 1 2 ( 1 2 ) d t

on 𝒟 : = { x 𝒞 ^ 1 [ 0 , 1 ] : x ( 0 ) = x ( 1 ) = 0 } . We first show that the function x ¯ ( t ) = 0 , 0 t 1 , is a weak local minimum for P in the topology induced by 1 , . Consider the open ball of radius 1 centered at x ¯ , i.e. 1 1 , ( x ¯ ) . For every x 1 1 , ( x ¯ ) , we have

( t ) 1 , t [ 0 , 1 ]

hence F [ x ] 0 . This proves that x ¯ is a local minimum for P since F [ x ¯ ] = 0 . In the topology induced by , on the other hand, the admissible trajectories x δ ( x ¯ ) are allowed to take arbitrarily large values ( t ) , 0 t 1 . Consider the sequence of functions defined by:

x k ( t ) : = { 1 k + 2 t 1  if  1 2 1 2 k t 1 2 1 k 2 t + 1  if  1 2 t 1 2 + 1 2 k 0  otherwise 

clearly, x k 𝒞 ^ 1 [ 0 , 1 ] and x k ( 0 ) = x k ( 1 ) = 0 for each k 1 , i.e., x k 𝒟 . Moreover

x k = max 0 t 1 | x k ( t ) | = 1 k

meaning that for every δ > 0 , there is a k 1 such that x k δ ( x ¯ ) . Finally

F [ x k ] = 0 1 k ( t ) 2 ( 1 k ( t ) 2 ) d t = [ 1 2 t ] 1 2 1 2 k 1 2 1 1 = 1 2 k < 0

for each k 1 . Therefore, the trajectory x ¯ cannot be a strong local minimum for P.

3.2.1 Existence of an Optimal Solution

Prior to deriving conditions that must be satisfied for a function to be a minimum (or a maximum) of a problem of the calculus of variations, one must ask the question whether such solutions actually exist for that problem. In the case of optimization in finite-dimensional Euclidean spaces, it has been shown that a continuous function on a nonempty compact set assumes its minimum (and its maximum) value in that set (see Weierstrass Theorem 2.1). It is possible to extend this result to continuous functionals on a nonempty compact subset of a normed function space. But as attractive as this solution to the problem of establishing the existence of maxima and minima may appear, it is of little help because most of the sets of interest are too ”large” to be compact.

The principal reason is that most of the sets of interest are not bounded with respect to the norms of interest. As just an example, the set
𝒟 : = { x 𝒞 1 [ a , b ] : x ( a ) = x a , x ( b ) = x b } is clearly not compact with respect to the strong norm as well as the weak norm 1 , , for we can construct a sequence of curves in 𝒟 (e.g., parabolic functions satisfying the boundary conditions) which have maximum values as large as desired. That is, the problem of minimizing, say, F [ x ] = x or F [ x ] = a b x ( t ) d t , on 𝒟 , does not have a solution.

A problem of the calculus of variations that does not have a minimum is addressed in the following:

Example 3.5 ( A Problem with No Minimum ). Consider the problem P to minimize the functional:

F [ x ] : = 0 1 x ( t ) 2 + ( t ) 2 d t  on  𝒟 : = { x 𝒞 1 [ 0 , 1 ] : x ( 0 ) = 0 , x ( 1 ) = 1 } .

Observe first that for any admissible trajectory x ( t ) joining the two end-points, we have

F [ x ] = 0 1 x 2 + 2 d t > 0 1 | | d t 0 1 d t = x ( 1 ) x ( 0 ) = 1

That is,

F [ x ] > 1 , x 𝒟 ,  and  inf { F [ x ] : x 𝒟 } 1 .

Now, consider the sequence of functions { x k } in 𝒞 1 [ 0 , 1 ] defined by x k ( t ) : = t k . Then

F [ x k ] = 0 1 t 2 k + k 2 t 2 k 2 d t = 0 1 t k 1 t 2 + k 2 d t 0 1 t k 1 ( t + k ) d t = 1 + 1 k + 1

so we have F [ x k ] k 1 . Overall, we have thus shown that inf F [ x ] = 1 . But since F [ x ] > 1 for any x in 𝒟 we know with certainty that F has no global minimizer on 𝒟 .

3.3 Optimality conditions for weak extrema

In this section, we will derive a set of necessary optimality conditions that a trajectory x must satisfy in order to be a candidate solution for a weak local minimum of a cost functional F [ x ] . Note that, if we choose 𝒳 = 𝒞 1 that is we work with once continuously differentiable functions, every strong extremum 2 2 What we mean by ”extremum” will be made clear later on is automatically a weak extremum, as a consequence the set of necessary conditions valid for weak extrema are necessary also for strong extrema. We start considering the basic problem of calculus of variations according to Definition 3.1 in which the admissible set is:

𝒟 : = { x 𝒞 1 ( [ t 1 , t 2 ] ) n x : x ( t 1 ) = x 1 , x ( t 2 ) = x 2 } ,

that is we are looking for a minimizing trajectory in the set of once continuously differentiable vector-valued functions that have given values at the endpoints. Recall that this is a free problem of the calculus of variations.

Definition 3.4: Perturbed functions

Given a function x 0 𝒟 we define a family of perturbed functions as:

x ( t ) = x 0 ( t ) + α ξ ( t ) ,

where α is a small real parameter and ξ ( t ) 𝒞 1 ( [ t 1 , t 2 ] ) n x are functions such that ξ ( t 1 ) = ξ ( t 2 ) = 0 .

Remark 3.3

Note that by choosing α sufficiently small we can make x 0 and x = x 0 + α ξ arbitrarily close in the sense of the 1 , . We have that:

x x 0 1 , = max t 1 t t 2 α ξ ( t ) + max t 1 t t 2 α ξ ˙ ( t ) = | α | ξ 1 , .

A pictorial example of perturbed function for the scalar case is given in Figure 3.4

pict

Figure 3.4:: Perturbed function

We now give a fundamental definition that can be seen as the generalization of the directional derivative to infinite-dimensional function spaces.

Definition 3.5: First Variation of a Functional, Gâteaux Derivative

Let F be a functional defined on a function space 𝒳 . Then, the first variation of F at x 𝒳 in the direction ξ 𝒳 also called Gâteaux derivative with respect to ξ at x , is defined as:

δ F x [ ξ ] = lim α 0 F [ x + α ξ ] F [ x ] α = α F [ x + α ξ ] | α = 0 ,

(provided it exists). If the limit exists for all ξ 𝒳 , then F is said to be Gâteaux differentiable at x

Remark 3.4

Note that for fixed x and ξ the functional F [ x + α ξ ] is actually a real-valued function of α that is g ( α ) F [ x + α ξ ] . Then the Gâteaux derivative is simply the first derivative of the function g with respect to α computed in α = 0 , i.e. δ F x [ ξ ] = d g d α | 0

Remark 3.5

Note that the Gâteaux derivative δ F x [ ξ ] need not exist in any direction ξ 0 , or it may exist is some directions and not in others. Its existence presupposes that:

  • F [ x ] is defined,
  • F [ x + α ξ ] is defined for all sufficiently small α .

Then:

δ F x [ δ x ] = α F [ x + α δ x ] | α = 0 = d g d α | 0

is defined only if this ”ordinary” derivative with respect to the real variable α exists at α = 0 . If the Gâteaux derivative exists, then it is unique.

Example 3.6 (Calculation of a Gâteaux Derivative). Consider the functional F [ x ] = a b x ( t ) 2 d t . In order to take the Gâteaux derivative we can define g ( α ) : = F [ x + α ξ ] and then directly differentiate it. That is:

δ F x [ ξ ] = d d α g ( α ) | α = 0 = [ d d α a b ( x ( t ) + α ξ ( t ) ) 2 d t ] | α = 0 = = [ a b d d α ( x ( t ) + α ξ ( t ) ) 2 d t ] | α = 0 = = [ a b 2 ( x ( t ) + α ξ ( t ) ) ξ ( t ) d t ] | α = 0 = a b 2 x ( t ) ξ ( t ) d t ,

on the other hand, we can directly take the limit:

δ F x [ ξ ] = lim α 0 F [ x + α ξ ] F [ x ] α = = lim α 0 a b ( x ( t ) + α ξ ( t ) ) 2 d t a b x ( t ) 2 d t α = = lim α 0 a b 2 α x ( t ) ξ ( t ) + α 2 ξ ( t ) 2 d t α = a b 2 x ( t ) ξ ( t ) d t .

Example 3.7 (Non-Existence of a Gâteaux Derivative). Consider the functional F [ x ] : = 0 1 | x ( t ) | d t . Clearly, F [ x ] is defined on all 𝒞 1 [ 0 , 1 ] for each continuous function x 𝒞 1 [ 0 , 1 ] results in a continuous integrand | x ( t ) | whose integral is finite. For x 0 ( t ) : = 0 and ξ ( t ) : = t we have:

F [ x 0 + α ξ ] = 0 1 | α t | d t ,

therefore:

F [ x 0 + α ξ ] F [ x 0 ] α = { 1 2  if  α > 0 1 2  if  α < 0

and a Gâteaux derivative does not exist at x 0 in the direction ξ .

Lemma 3.1: Descent Direction

Let F be a functional defined on a normed linear space ( 𝒳 , ) (e.g a function space). Suppose that F has a strictly negative variation δ F x ¯ [ ξ ] < 0 at a point x ¯ 𝒳 in some direction ξ 𝒳 . Then, x ¯ cannot be a local minimum point for F (in the sense of the norm )

Proof. Since δ F x ¯ [ ξ ] < 0

δ > 0  such that  F [ x ¯ + α ξ ] F [ x ¯ ] α < 0 α δ ( 0 ) ,

hence,

F [ x ¯ + α ξ ] < F [ x ¯ ] , α ( 0 , δ )

since x ¯ + α ξ x ¯ = | α | ξ 0 as α 0 + , the points x ¯ + α ξ are eventually in each neighborhood of x ¯ , irrespective of the norm considered on 𝒳 . Thus, local minimum behavior of F , in the sense of Definition 3.3 is not possible in the direction ξ at x ¯ . □

Definition 3.6: 𝒟 -Admissible Directions

Let F be a functional defined on a subset 𝒟 of a linear space 𝒳 and let x ¯ 𝒟 . Then, a direction ξ 𝒳 , ξ 0 , is said to be 𝒟 -admissible at x ¯ for F if:

  • δ F x ¯ [ ξ ] exists.
  • x ¯ + α ξ 𝒟 for all sufficiently small α that is:
    δ > 0  such that  x ¯ + α ξ 𝒟 , α δ ( 0 )

Remark 3.6

Note that if ξ is a 𝒟 -admissible direction then also the parametrized family of directions a ξ for all a is a 𝒟 -admissible direction. This simple fact follows directly from the definition of 𝒟 -admissible directions.

Remark 3.7: Properties of First Varations

It is instrumental to highlight some properties of the first variation of a functional:
Linearity. Let F 1 and F 2 be two functionals defined on 𝒟 and ξ a 𝒟 -admissible direction. Consider a linear combination of the two functionals F ~ = a 1 F 1 + a 2 F 2 a a Indeed F ~ is still a functional. Then δ F ~ x [ ξ ] = a 1 F 1 , x [ ξ ] + a 2 F 2 , x [ ξ ] . This linearity property is inherited by the linearity property of the ordinary derivative.
Homogeneity. Let F be a functional defined on 𝒟 and a ξ a a family of 𝒟 -admissible directions, then δ F x [ a ξ ] = a δ F x [ ξ ] a . In other words, the first variation is a homogeneous operator.

Note also that in general the first variation is not a linear operator in ξ that is δ F x [ a 1 ξ 1 + a 2 ξ 2 ] a 1 δ F x [ ξ 1 ] + a 2 δ F x [ ξ 2 ] for a 1 , a 2 and ξ 1 , ξ 2 𝒟 -admissible directions although it may be true in some special cases. We will require linear Gâteaux derivatives to prove some results of inequality constrained problems.

Theorem 3.1: Geometric Necessary Conditions for a Local Minimum

Let F be a functional defined on a subset 𝒟 of a normed linear space ( 𝒳 , ) . Suppose that x 𝒟 is a local minimum point for F on 𝒟 . Then:

δ F x [ ξ ] = 0 for each 𝒟 -admissible direction ξ  at  x .

Proof. By contradiction, suppose that there exists a 𝒟 -admissible direction ξ such that δ F x [ ξ ] < 0 . Then, by Lemma 3.1, x cannot be a local minimum for F . Likewise, there cannot be a 𝒟 -admissible direction ξ such that δ F x [ ξ ] > 0 . Indeed, ξ being 𝒟 -admissible and δ F x [ ξ ] = δ F x [ ξ ] < 0 by Lemma 3.1, x cannot be a local minimum. Overall, we must have that δ F x [ ξ ] = 0 for each 𝒟 -admissible direction ξ at x . □

Remark 3.8

Note that Theorem 3.1 can be interpreted as the usual necessary condition for optimality for unconstrained NLP problems. Let’s define g ( α ) : = F [ x + α ξ ] . Note that the scalar function g ( α ) is the value of the functional when perturbed around a minimizing trajectory x that is we are assuming that F [ x ] is a local minimum. Equivalently, we are stating that g ( 0 ) F [ x ] is a local minimum for g ( α ) . Therefore a necessary condition for g ( 0 ) to be a local minimum is d g d α | α = 0 = 0 . But for the definition of g we also have:

d g d α | α = 0 = δ F x [ ξ ] = 0 ,

where the last statement must hold for all admissible directions ξ 𝒟 .

We now state some important results needed for the derivations of the necessary conditions of optimality.

Lemma 3.2

Let h 𝒞 1 [ t 1 , t 2 ] and ξ 𝒟 = { ξ 𝒞 1 [ t 1 , t 2 ] such that ξ ( t 1 ) = ξ ( t 2 ) = 0 } Suppose that:

t 1 t 2 h ( t ) ξ ( t ) d t = 0 ξ 𝒟

then h ( t ) 0 t [ t 1 , t 2 ]

Proof. By contradiction suppose there is a t ~ [ t 1 , t 2 ] such that h ( t ~ ) 0 , say h ( t ~ ) > 0 a a the same reasoning for h ( t ~ ) < 0 is analogous and is left to the reader as an exercise. By continuity of h there exist a ball around t ~ such that h is always positive, formally δ > 0 such that h ( t ) > 0 t δ ( t ~ ) . Since ξ is arbitrary we can choose a function that is 0 everywhere but in δ ( t ~ ) where we choose it to be a positive continuous and differentiable function t δ ( t ~ ) then, for such a ξ , we have:

t 1 t 2 h ( t ) ξ ( t ) d t = δ ( t ~ ) h ( t ) ξ ( t ) d t > 0

since both h ( t ) and ξ ( t ) are positive for t δ ( t ~ ) and thus we reach a contradiction. Hence, h ( t ) = 0 t [ t 1 , t 2 ] . □

Remark 3.9

Note that with simple modifications Lemma 3.2 to the case:

t 1 t 2 h ( t ) ξ ( t ) d t = 0 ξ 𝒟

Where h 𝒞 [ t 1 , t 2 ] n x and ξ 𝒟 = { ξ [ t 1 , t 2 ] such that ξ ( t 1 ) = ξ ( t 2 ) = 0 } . We recover the case proved by the Lemma 3.2 by rewriting the previous equation as:

t 1 t 2 h ( t ) ξ ( t ) d t = t 1 t 2 i = 1 n x h i ( t ) ξ i ( t ) d t = 0 ξ 𝒟

Then it is sufficient to consider functions ξ i sucht that ξ j = 0 j i and apply the results of Lemma 3.2 for each equation t 1 t 2 h i ( t ) ξ i ( t ) d t = 0 . In this way we obtain that the vector-valued function h = 0 for each t [ t 1 , t 2 ] .

We state another standard instrumental theorem without proof.

Theorem 3.2: Differentiation Under the Integral Sign

Let f : × such that f : = f ( t , α ) , be a continuous function with continuous partial derivative f α = f α on [ t 1 , t 2 ] × [ α 1 , α 2 ] . Then

g ( α ) : = t 1 t 2 f ( t , α ) d t

is in 𝒞 1 [ α 1 , α 2 ] , with the derivative

d d α g ( α ) = d d α t 1 t 2 f ( t , α ) d t = t 1 t 2 f α d t .

Remark 3.9 and Theorem 3.2 are used to obtain a set of necessary conditions for optimality known as Euler equations.

Theorem 3.3: Euler’s Necessary Conditions for Optimality

Consider the problem to minimize the functional:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t s . t . x 𝒟

where the functional space is defined as 𝒟 = { x 𝒞 1 [ t 1 , t 2 ] such that x ( t 1 ) = x 1 x ( t 2 ) = x 2 } and f : × n x × n x a continuously differentiable function. Suppose that x 𝒟 gives a (local) minimum for F on 𝒟 . Then x satisfies the Euler equations:

d d t f ( t , x ( t ) , x ˙ ( t ) ) x ˙ = f ( t , x ( t ) , x ˙ ( t ) ) x

Proof. We start by computing the first variation of the functional F at the minimizing trajectory x in a 𝒟 -admissible direction ξ :

δ F x [ ξ ] = α F [ x + α ξ ] | α = 0 = = [ t 1 t 2 { f ( t , x + α ξ , x ˙ + α ξ ˙ ) x ˙ ξ ˙ + f ( t , x + α ξ , x ˙ + α ξ ˙ ) x ξ } d t ] | α = 0 = = t 1 t 2 { f ( t , x , x ˙ ) x ˙ ξ ˙ + f ( t , x , x ˙ ) x ξ } d t .

By Theorem 3.1 we must have:

δ F x [ ξ ] = t 1 t 2 { f ( t , x , x ˙ ) x ˙ ξ ˙ + f ( t , x , x ˙ ) x ξ } d t = 0 𝒟 -admissible ξ .

By Definition 3.6, 𝒟 -admissible directions are 𝒞 1 [ t 1 , t 2 ] functions such that ξ ( t 1 ) = ξ ( t 2 ) = 0 . Note that in this way we have x + α ξ 𝒟 . We modify the expression of the first variation in order to apply Lemma 3.2. Using integration by parts we have:

t 1 t 2 f ( t , x , x ˙ ) x ˙ ξ ˙ d t = [ f ( t , x , x ˙ ) x ˙ ξ ] | t 1 t 2 t 1 t 2 d d t f ( t , x , x ˙ ) x ˙ ξ d t

but since ξ is a 𝒟 -admissible direction ξ ( t 1 ) = ξ ( t 2 ) = 0 we have:

t 1 t 2 f ( t , x , x ˙ ) x ˙ ξ ˙ d t = t 1 t 2 d d t f ( t , x , x ˙ ) x ˙ ξ d t .

The necessary condition for optimality then becomes:

δ F x [ ξ ] = t 1 t 2 [ f ( t , x , x ˙ ) x d d t f ( t , x , x ˙ ) x ˙ ] ξ = 0 𝒟 -admissible ξ .

The last equation satsifies the conditions of Lemma 3.2, therefore we must have:

f ( t , x , x ˙ ) x d d t f ( t , x , x ˙ ) x ˙ = 0 t [ t 1 , t 2 ]

Definition 3.7: Stationary Function

Each 𝒞 1 function x ¯ that satisfies the Euler equations on some interval is called a stationary function for the Lagrangian f .

Remark 3.10

Note that the Euler equations are necessary conditions for optimality therefore a stationary point x ¯ can be a minimum, a maximum or none of them. In the literature stationary points are alse called extrema or extremal trajectories.

Example 3.8 (Simplest calculus of variations problem). Given two points in the plane A ( x 1 , y 1 ) and B ( x 2 , y 2 ) , find the curve of minimum length that joins them. A schematic picture is drawn in Figure 3.5 . Obviously, the solution to this problem is a line joining A and B but we will work out the solution using the Euler equation. The functional that we want to minimize is the length of the curve, that is we want to solve the problem:

min y ( x ) F [ y ] = x 1 x 2 f ( x , y ( x ) , y ( x ) ) d x = x 1 x 2 d s = x 1 x 2 1 + y ( x ) 2 d x s . t . y 𝒟 = { y 𝒞 1 [ x 1 , x 2 ] such that y ( x 1 ) = y 1 y ( x 2 ) = y 2 }

note that with respect to the previous notation x is the independent variable and we are seeking for a function y ( x ) . We have also indicated y ( x ) = d y d x . The lagrangian for this problem is f = 1 + y ( x ) 2 . Note that it is independent of y ( x ) . The Euler equation for this problem reads:

d d x f y = f y = 0

As a consequence:

d d x f y = d d x y 1 + y ( x ) 2 = y ( x ) ( 1 + y 2 ) 3 = 0

Since the denominator is always different from zero we have y ( x ) = 0 , therefore stationary functions have zero second-derivative. We can then integrate this condition to derive y :

y ( x ) = C 1 x + C 2

Finally imposing that y ( x ) 𝒟 , that is imposing the boundary conditions y ( x 1 ) = y 1 and y ( x 2 ) = y 2 we obtain:

y ( x ) = y 1 + y 2 y 1 x 2 x 1 ( x x 1 )

pict

Figure 3.5:: Minimum length problem

Remark 3.11

Depending on the structure of the lagrangian function f , Euler equations can be simplified.

f does not explicitly depend on the function x .
Suppose f = f ( t , x ˙ ) , then the Euler equations becomes d d t f x ˙ = 0 .

f does not explicitly depend on the function x ˙ .
Suppose f = f ( t , x ) then the Euler equations becomes f x = 0 . This is a degenerate case.

f does not explicitly depend on the independent variable t .
Suppose f = f ( x , x ˙ ) , we define the Hamiltonian function H ( x , x ˙ ) = f f x ˙ x ˙ . If x ¯ is a stationary point then the Hamiltonian is constant. The total time-derivative of the Hamiltonian is:

d d t H = d d t ( f f x ˙ x ˙ ) = = f x x ˙ + f x ˙ x ¨ ( d d t f x ˙ ) x ˙ f x ˙ x ¨ = = ( f x d d t f x ˙ ) x ˙ = 0

Therefore an equivalent necessary condition is H = f f x ˙ x ˙ = C . Where C is a constant. This is also known as Beltrami Identity.

Example 3.9 (A more challenging problem: The Brachistochrone). Recall the Brachistochrone Problem 3.1 :

min x F [ x ] = ξ A ξ B 1 + ( ξ ) 2 v A 2 2 g ( x ( ξ ) x A ) d ξ s . t . x 𝒟 = { x 𝒞 1 [ ξ 1 , ξ 2 ] such that x ( ξ 1 ) = x A x ( ξ 2 ) = x B } 3 .

For the sake of simplicity, we consider a slightly simplified problem. Assume that the initial velocity v A = 0 and the initial point is the origin, that is A = ( 0 , 0 ) . The lagrangian for this problem is f = 1 + ( ξ ) 2 2 g x ( ξ ) . Note that the lagrangian is independent of ξ , hence the Hamiltonian is stationary. Therefore, we have:

H = f f = 1 + ( ξ ) 2 2 g x ( ξ ) 2 x ( 1 + 2 ) = 1 2 g 1 + 2 2 x ( 1 + 2 = C 0

Which is equivalent to:

x ( 1 + 2 ) = 2 C 1

for a new constant C 1 = 1 g C 0 2 . Thus we have:

= 2 C 1 x x

hat is equivalent to:

d ξ = x 2 C 1 x d x

In order to solve the previous integral we make the substitution x = C 1 ( 1 + c o s ( 2 ψ ) ) , therefore d x = 2 C 1 s i n ( 2 ψ ) d ψ The integral becomes:

x A x x 2 C 1 x d x = ψ A ψ C 1 ( 1 + c o s ( 2 ψ ) 2 C 1 C 1 ( 1 + c o s ( 2 ψ ) ) 2 C 1 s i n ( 2 ψ ) d ψ = = 2 C 1 ψ A ψ ( 1 + c o s ( 2 ψ ) s i n 2 ( 2 ψ ) ( 1 c o s ( 2 ψ ) ) d ψ = = 2 C 1 ψ A ψ ( 1 + c o s ( 2 ψ ) ( 1 c o s 2 ( 2 ψ ) ) ( 1 c o s ( 2 ψ ) ) d ψ = = 2 C 1 ψ A ψ ( 1 + c o s ( 2 ψ ) d ψ = a C 1 ( 2 ψ + s i n ( 2 ψ ) )

where a is a constant to be determined. Overall we have obtained a curve in the parameter ψ that is:

x = C 1 ( 1 + c o s ( 2 ψ ) ) ξ = a C 1 ( 2 ψ + s i n ( 2 ψ ) .

This is the parametric form of a family of cycloids depending on the constants C 1 (that in turn depends on C 0 ) and a . The first boundary condition allows to easily find a in terms of C 1 . That is:

x A = 0 = C 1 ( 1 + c o s ( 2 ψ A ) ) ψ A = π 2 ξ A = 0 = a C 1 [ 2 ψ A + s i n ( 2 ψ A ] a = C 1 π

Thus, we obtain a smaller family of cycloids passing through the origin as shown in Figure 3.6 whose equations are:

x = C 1 ( 1 + c o s ( 2 ψ ) ) ξ = C 1 ( π 2 ψ s i n ( 2 ψ ) )

note that these are alle minimum time curves. The second boundary condition is somewhat more involved and is solved numerically, the nonlinear system is:

x B = C 1 ( 1 + c o s ( 2 ψ B ) ) ξ B = C 1 ( π 2 ψ s i n ( 2 ψ B ) )

The solution joining points A = ( 0 , 0 ) and B = ( 6 , 5 ) is shown in red in Figure 3.6 . The constant C 1 = 2 . 6 2 .

pict

Figure 3.6:: Minimum time curves originating from point A in blue. Minimum time curve joining points A and B in red.

Remark 3.12: Hamilton’s Interpretation of Lagrangian Mechanics

According to the principle of least action, a system in motion will always follow the trajectory that minimize the action functional:

A [ q ] = t 1 t 2 L ( q , q ˙ ) d t = t 1 t 2 ( T ( q , q ˙ ) V ( q ) ) d t

where q 𝒞 1 [ t 1 , t 2 ] n are the generalized lagrangian coordinates of an n degrees of freedom system. T ( q , q ˙ ) is the kinetic energy while V ( q ) is the potential energy. The Euler necessary condition reads:

q ( T V ) = d d t x ˙ ( T V ) = 0 d d t T q ˙ T q + V q = 0

which is the usual Lagrange equation of analytical mechanics. In view of this mechanical interpretation recall the definition of Hamiltonian H = f f x ˙ x ˙ . In general the kinetic energy have the form T ( q , q ˙ ) = 1 2 q ˙ M ( q ) q ˙ . In this case the Hamiltonian reads:

H = T V ( T V ) q ˙ q ˙ = ( T + V ) = E

where E is the total energy of the system. According to Remark 3.11 if the lagrangian is independent of time t the Hamiltonian is constant along an extremal trajectory. In other words the total energy of the system is conserved. Finally, recall that the momentum m of a mechanical system can be defined as m = T q ˙ , hence the special case when the lagrangian is independent of x in Remark 3.11 can be viewed as a momentum conservation that is d d t T q ˙ = d d t m = 0 .

Analogously to NLP problems, also for problems of the calculus of variations we can derive a second-order necessary condition. Recall that in the finite-dimensional NLP case the second-order necessary condition was the semi-positive definiteness of the Hessian matrix that is equivalent to the objective function being locally convex in the neighbourhood of a stationary point. Similar conditions hold in the infinite-dimensional case. First, we need the following definition:

Definition 3.8: Second Variation of a Functional

Let F be a functional defined on a function space 𝒳 . Then, the second variation of F at x 𝒳 in the direction ξ 𝒳 is defined as:

δ 2 F x [ ξ ] = 2 α 2 F [ x + α ξ ] | α = 0

(provided it exists).

Theorem 3.4: Second-Order Necessary Conditions (Legendre)

Consider the problem to minimize the functional:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t s . t . x 𝒟

where the functional space is defined as 𝒟 = { x 𝒞 1 [ t 1 , t 2 ] n x such that x ( t 1 ) = x 1 x ( t 2 ) = x 2 } and f : × n x × n x a continuously differentiable function. Suppose that x 𝒟 gives a (local) minimum for F on 𝒟 . Then x satisfies the Euler equation along with the so-called Legendre condition:

2 f x ˙ 2 0 t [ t 1 , t 2 ] .

Proof. Based on the differentiability properties of Theorem 3.2 we can compute the second variation as:

δ 2 F x [ ξ ] = 2 α 2 F [ x + α ξ ] | α = 0 = t 1 t 2 2 α 2 f [ x + α ξ ] d t = t 1 t 2 ( ξ f x x [ x + α ξ ] ξ + 2 ξ f x x ˙ [ x + α ξ ] ξ ˙ + ξ ˙ f x ˙ x ˙ [ x + α ξ ] ξ ˙ ) d t

where we have used the compressed notation f [ x + α ξ ] : = f ( t , x ( t ) + α ξ ( t ) , x ˙ ( t ) + α ξ ˙ ( t ) ) . Substituing α = 0 gives:

δ 2 F x [ ξ ] = t 1 t 2 ( ξ f x x [ x + α ξ ] ξ + + 2 ξ f x x ˙ [ x + α ξ ] ξ ˙ + ξ ˙ f x x [ x + α ξ ] ξ ˙ ) d t .

Define the real-valued function of α , g ( α ) : = F [ x + α ξ ] . We have that α = 0 satisfies the necessary conditions for a local minimum of an unconstrained optimization problem, since g ( 0 ) = F [ x ] is a local minimum for the functional F . Therefore d g d α | 0 = 0 and d 2 g d 2 α | 0 0 . Note that d 2 g d α 2 | 0 = 2 α 2 F [ x + α ξ ] | α = 0 = δ 2 F x [ ξ ] . Hence we have:

d 2 g d α 2 | 0 = δ 2 F x [ ξ ] = t 1 t 2 ( ξ f x x [ x + α ξ ] ξ + + 2 ξ f x ˙ x [ x + α ξ ] ξ ˙ + ξ ˙ f x x [ x + α ξ ] ξ ˙ ) d t 0 .

The last equation must hold true for each perturbation ξ 𝒞 1 [ t 1 , t 2 ] such that ξ ( t 1 ) = ξ ( t 2 ) = 0 . We use Einstein’s notation to imply the double summation, that is:

ξ A ξ = i = 1 n x j = 1 n x a i , j ξ i ξ j a i , j ξ i ξ j

to simplify the notation let’s define A : = f x ˙ x [ x ] . Note that due to Schwarz’s theorem A is symmetric. Using integration by parts we have:

t 1 t 2 a i , j ξ i ξ ˙ j d t = ( a i , j ξ i ξ j ) | t 1 t 2 t 1 t 2 ( ȧ i , j ξ i ξ j + a i , j ξ ˙ i ξ j ) d t

then, using the symmetry of A and the boundary conditions on ξ , we have:

t 1 t 2 2 a i , j ξ i ξ ˙ j d t = t 1 t 2 ȧ i , j ξ i ξ j d t

hence we have:

2 t 0 t 1 ξ f x ˙ x [ x ] ξ ˙ = t 1 t 2 ξ ( d d t f x ˙ x [ x ] ) ξ d t .

The necessary condition on the second variation becomes:

δ 2 F x [ ξ ] = t 1 t 2 { ξ ( f x x [ x ] d d t f x ˙ x [ x ] ) ξ + ξ ˙ f x ˙ x ˙ [ x ] ξ ˙ } d t 0

by Lemma 3.3 a necessary condition on δ 2 F x [ ξ ] to be nonnegative is:

f x ˙ x ˙ [ x ] = 2 f x ˙ 2 0 t [ t 1 , t 2 ]

Lemma 3.3

Let P ( t ) and Q ( t ) be given continuous ( n x × n x ) symmetric matrix functions on [ t 1 , t 2 ] , and let the quadratic functional:

t 1 t 2 ξ ( t ) Q ( t ) ξ ( t ) + ξ ˙ ( t ) P ( t ) ξ ˙ ( t ) d t

be defined for all ξ 𝒞 1 [ t 1 , t 2 ] n x such that ξ ( t 1 ) = ξ ( t 2 ) = 0 . Then, a necessary condition for the quadratic functional to be nonnegative for all such ξ is that P ( t ) 0 for each t [ t 1 , t 2 ] .

It would be tempting to extend the analogy with NLP problems and looking for a sufficient condition. Recall that a sufficient condition for a local minimum of an unconstrained NLP was the stationarity and the strict convexity of the Hessian at the candidate solution, see Theorem 2.8. Unfortunately, the requirements that x ¯ shall be a stationary function for F and the lagrangian f ( t , y , z ) 4 strictly convex with respect to the third argument z are not sufficient for x ¯ to be a (weak) local minimum of a free problem of the the calculus of variations. Additional conditions must hold, such as the absence of points conjugate to the point t 1 in [ t 1 , t 2 ] , such condition is called Jacobi’s sufficient condition. We remind the reader to the vast literature about this topic. Instead we give a sufficient condition for a global minimum that relies on the lagrangian f ( t , y , z ) being jointly convex in the second and third argument y , z respectively. Unfortunately this condition is seldom satisfied in practice. Recall that for a continuously differentiable function j ( x , y ) : C x n x × C y n y , joint convexity means:

j ( x , y ) j ( x 0 , y 0 ) + + x j ( x 0 , y 0 ) [ x x 0 ] + y j ( x 0 , y 0 ) [ y y 0 ] x C x and y C y

Theorem 3.5: Sufficient Conditions for a Weak Global Minimum

Consider the problem to minimize the functional:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t s . t . x 𝒟

where the functional space is defined as 𝒟 = { x 𝒞 1 [ t 1 , t 2 ] n x such that x ( t 1 ) = x 1 x ( t 2 ) = x 2 } and f : × n x × n x a continuously differentiable function. Suppose that the Lagrangian f ( t , y , z ) is [strictly] jointly convex in ( y , z ) . If x is a stationary function for the Lagrangian f a a It satifies the Euler necessary condtions then x is also a [strict] global minimizer for F on 𝒟 .

Proof.

F [ x ] F [ x ] = t 1 t 2 f ( t , x , x ˙ ) f ( t , x , x ˙ ) d t t 1 t 2 f ( t , x , x ˙ ) x [ x x ] + f ( t , x , x ˙ ) x ˙ [ x ˙ x ˙ ] d t = t 1 t 2 ( f ( t , x , x ˙ ) x d d t f ( t , x , x ˙ ) x ˙ ) [ x x ] d t + + ( f ( t , x , x ˙ ) x ˙ [ x x ] ) | t 1 t 2 = 0

where we have used joint convexity of f in the second and third argument. Subsequenty, we make use of integration by parts and the fact that x and x are equal at the boundary, that is x ( t 1 ) = x ( t 1 ) and x ( t 2 ) = x ( t 2 ) together with the hypothesis that x is a stationary point. Overall we have:

F [ x ] F [ x ] 0 x 𝒟

which is the definition of global minimum of a functional. □

Example 3.10. Consider the problem P to minimize the functional

F [ x ] : = t 1 t 2 2 d t

on 𝒟 : = { x 𝒞 1 [ 0 , 1 ] : x ( 0 ) = 0 , x ( 1 ) = x 1 } . The lagrangian f ( t , y , z ) = z 2 is convex in z and does not depend on y , then thanks to Theorem 3.5 every stationary point is a global minimum. The Euler necessary condition for this case reduces to:

d d t f = = 0

therefore by double integration and imposing the boundary condition we obtain:

x ( t ) = t x 1

that is a straight line joining the origin and the point P = ( 1 , x 1 ) . The extremal function x is the unique global minimum thanks to Theorem 3.5 since it is the unique stationary point of the functional to be minimized.

3.3.1 Problems with free end points

We now consider slightly different and more involved problems. Consider the problem in the following definition:

Definition 3.9: Calculus of Variation Free End-point problem

Consider the Calculus of Variations problem with free end-point and end-time:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t + ϕ ( t 2 , x ( t 2 ) ) s . t . x 𝒟 = { ( x , t 2 ) 𝒞 1 [ t 1 , T ] n x × such that x ( t 1 ) = x 1 }

Remark 3.13

The cost functional in Definition 3.9 represent a Bolza problem. Note that the neigher the end-time at t 2 nor the end-point x ( t 2 ) are specified and our search space of functions is in some sense larger. Our search space is defined as 𝒞 1 [ t 1 , T ] n x for sufficiently large T so that it can include t 2 . We need to derive additional conditions that an extremal must satisfy. These conditions should allow us to find an equation for the end-point x ( t 2 ) and the end-time t 2 at which this end-point is reached. Informally, we are then looking for n x + 1 additional equations to derive. n x equations are needed for x ( t 2 ) while one equation for the final time t 2 .

We give without proof an important theorem that is instrumental to derive optimality conditions for this problem.

Theorem 3.6: of Leibniz

Let f : × such that f : = f ( t , α ) , be a continuous function with continuous partial derivative f α = f α on [ t 1 , t 2 ] × [ α 1 , α 2 ] and h C 1 [ α 1 , α 2 ] . Then

d d α t 1 h ( α ) f ( t , α ) d t = t 1 h ( α ) f α ( t , α ) d t + h α f ( h ( α ) , α )

Theorem 3.7: Necessary conditions for free end-point and end-time problems

Consider the problem in Definition 3.9. If x is a local minimum point then x must satisfy:

f [ x ] x d d t f [ x ] x ˙ = 0 t [ t 1 , t 2 ] f [ x ( t 2 ) ] x ˙ + ϕ ( t 2 , x ( t 2 ) ) x = 0 H ( t 2 , x ( t 2 ) , x ˙ ( t 2 ) ) + ϕ ( t 2 , x ( t 2 ) ) t = 0 .

Proof. Recall that the geometric necessary condition in Theorem 3.1 does not assume any specific form for the functional F . Hence, a necessary condition for a local minimum of a free end-point problem is again δ F x [ ξ ] = 0 for all 𝒟 -admissible direction ξ . Note that in order to be a 𝒟 -admissible direction for this problem ξ C 1 [ t 1 , t 2 ] n x must be such that ξ ( t 1 ) = 0 . We highlight again the fact that t 2 is not given and that ξ must not satisfy any boundary condition at t 2 . Since t 2 is not fixed we need to allow variations of the end-time t 2 around the optimal time t 2 , thus we consider end-time variations of the form t ~ 2 = t 2 + α τ . It is essential to notice that the modulating parameter α is the same for variation of the end-time and of the trajectory ξ . With this assumption we do not loose generality since τ and ξ are arbitrary. The first variation of the Bolza-type functional is:

δ F x [ ξ ] = ( α t 1 t 2 + α τ f ( t , x ( t ) + α ξ ( t ) , x ˙ ( t ) + α ξ ˙ ( t ) ) d t ) | α = 0 + + ( α ϕ ( t 2 + α τ , x ( t 2 + α τ ) + α ξ ( t 2 + α τ ) ) ) | α = 0 ,

we carry out the derivative term by term. Note that in the first term we can apply Theorem 3.6. Thus we have:

( α t 1 t 2 + α τ f ( t , x ( t ) + α ξ ( t ) , x ˙ ( t ) + α ξ ˙ ( t ) ) d t ) | α = 0 = ( t 1 t 2 + α τ ( f [ x + α ξ ] x ξ + f [ x + α ξ ] x ˙ ξ ˙ d t ) + + τ f [ x ( t 2 + α τ ) + α ξ ( t 2 + α τ ) ] ) | α = 0 = = t 1 t 2 ( f [ x ] x ξ + f [ x ] x ˙ ξ ˙ ) d t + τ f [ x ( t 2 ) ] ,

again we have used to compressed notation f [ x ( t 2 + α τ ) + α ξ ( t 2 + α τ ) ] = f ( t 2 , x ( t 2 + α τ ) + α ξ ( t 2 + α τ ) , x ˙ ( t 2 + α τ ) + α ξ ˙ ( t 2 + α τ ) ) . Using integration by parts and the boundary condition ξ ( t 1 ) = 0 we have:

t 1 t 2 ( f [ x ] x ξ + f [ x ] x ˙ ξ ˙ ) d t = = t 1 t 2 ( f [ x ] x d d t f [ x ] x ˙ ) ξ d t + f [ x ( t 2 ) ] x ˙ ξ ( t 2 ) ,

while the second term in the expression of the first variation is:

( α ϕ ( t 2 + α τ , x ( t 2 + α τ ) + α ξ ( t 2 + α τ ) ) ) | α = 0 = = ϕ ( t 2 , x ( t 2 ) ) t τ + ϕ ( t 2 , x ( t 2 ) ) x [ x ˙ ( t 2 ) τ + ξ ( t 2 ) ] ,

the geometric necessary condition for optimality is therefore:

δ F x [ ξ ] = t 1 t 2 ( f [ x ] x d d t f [ x ] x ˙ ) ξ d t + + f [ x ( t 2 ) ] x ˙ ξ ( t 2 ) + ϕ ( t 2 , x ( t 2 ) ) t τ + ϕ ( t 2 , x ( t 2 ) ) x [ x ˙ ( t 2 ) τ + ξ ( t 2 ) ] + τ f [ x ( t 2 ) ] = 0 𝒟 -admissible ξ , τ ,

since the necessary condition must hold along any 𝒟 -admissible direction ξ and final endpoint variation τ , the underlying rationale is to choose variations that allow us to derive a set of necessary conditions in the form of differential equations or boundary conditions. First we consider a family of variations 𝒟 1 such that τ = 0 and ξ such that also at the second endpoint holds ξ ( t 2 ) = 0 . Note that these variations are indeed 𝒟 -admissible. Therefore we have:

δ F x [ ξ ] = t 1 t 2 ( f [ x ] x d d t f [ x ] x ˙ ) ξ d t = 0 ξ 𝒟 1

Note that the previous equation is exactly the equation from which we derived the Euler necessary condition. We conclude that Euler equation is a necessary condition and must hold at a stationary point also in this case.

f [ x ] x d d t f [ x ] x ˙ = 0 t [ t 1 , t 2 ]

Note that t 2 is yet to be determined. Now we consider a second family of 𝒟 -admissible directions 𝒟 2 such that τ = 0 but ξ ( t 2 ) is free to vary. Since Euler equation is satisfied we have:

δ F x [ ξ ] = f [ x ( t 2 ) ] x ˙ ξ ( t 2 ) + ϕ ( t 2 , x ( t 2 ) ) x ξ ( t 2 ) = 0 ξ ( t 2 ) 𝒟 2 ,

we can conclude that:

f [ x ( t 2 ) ] x ˙ + ϕ ( t 2 , x ( t 2 ) ) x = 0 ,

this is a set of n x boundary conditions at t 2 that implicitely defines x ( t 2 ) . These conditions are called transversality conditions. Finally we use a family of variations 𝒟 3 that nullifies the term [ ξ ( t 2 ) + τ x ˙ ( t 2 ) ] , hence variations that satisfies ξ ( t 2 ) = τ x ˙ ( t 2 ) while τ is left free to vary. Again, Euler equation is satisfied and thus we have:

δ F x [ ξ ] = ( f [ x ( t 2 ) ] f [ x ( t 2 ) ] x ˙ x ˙ ( t 2 ) + ϕ ( t 2 , x ( t 2 ) ) t ) τ = 0 τ .

Therefore we have:

f [ x ( t 2 ) ] f [ x ( t 2 ) ] x ˙ x ˙ ( t 2 ) + ϕ ( t 2 , x ( t 2 ) ) t = 0 .

That is a scalar equation that implicitely defines t 2 . Note that the first two terms are precisely the definition of the Hamiltonian evaluated at the final optimal instant. The condition can be rewritten as

H ( t 2 , x ( t 2 ) , x ˙ ( t 2 ) + ϕ ( t 2 , x ( t 2 ) ) t = 0

Remark 3.14

A similar reasoning holds for more general Bolza problems where the function ϕ depends also on the initial time, that is ϕ ( t 1 , x ( t 1 ) , t 2 , x ( t 2 ) ) . In this case, neither the initial point x ( t 1 ) nor the initial time t 1 are fixed and thence the 𝒟 -admissible directions need not to satisfy ξ ( t 1 ) = 0 . In this case, the transversality condition and the end-point condition of the Hamiltonian must hold also at the initial time.

Remark 3.15

The free end-point problem of Lagrange ( i.e. when ϕ 0 ) can be handled in the same way, applying the necessary conditions of Theorem 3.7 yields the so called natural boundary conditions:

H ( t 2 , x ( t 2 ) , x ˙ ( t 2 ) ) = 0 f [ x ( t 2 ) ] x ˙ = 0 ,

these boundary conditions must hold at a stationary point together with the Euler equations. Note that if t 2 is fixed while x ( t 2 ) is left free only the second equation must hold and vice versa.

Example 3.11. Consider the problem to minimize the functional:

min x ( t ) 𝒟 F [ x ] = t 1 t 2 1 2 2 d t + 1 2 p ( x ( t 2 ) x 2 ) 2 s . t . x 𝒟 = { x 𝒞 1 [ t 1 , t 2 ] such that x ( t 1 ) = x 1 }

in which the final time t 2 is fixed but not the endpoint x ( t 2 ) . p + is a scalar weight. In this case ϕ ( t 2 , x ( t 2 ) ) = 1 2 p ( x ( t 2 ) x 2 ) 2 is a strongly convex nonnegative function that penalizes the distance of the final endpoint from the given target x 2 . We highlight again the fact that x ( t 2 ) is not given and it must be found using the transversality condition. The term under the integral sign is a strongly convex function that penalizes high value of the derivative of x . Informally, given an initial point x ( t 1 ) = x 1 we want to find the curve that reaches a trade-off between having a small derivative along its trajectory and approaching as close as possible the target x 2 . Obviously the relative weight of these objectives depend on the magnitude of p . For the sake of simplicity, let’s consider t 1 = 0 . The Euler necessary condition reads:

= 0 x ( t ) = x 1 + A t

note that we can only assign the initial boundary condition x ( t 1 ) = x 1 . The transversality condition reads:

( t 2 ) + p ( x ( t 2 ) x 2 ) = 0 A + p ( x 1 + A t 2 x 2 ) = 0 A = p ( x 2 x 1 ) 1 + p t 2 .

Therefore the extremal trajectory is:

x ( t ) = x 1 + p ( x 2 x 1 ) 1 + p t 2 t ,

note that as p we have x ( t ) x 1 + x 2 x 1 t 2 t that is x ( t 2 ) x 2 in the limit of an infinite weight p . This would be the solution of the Lagrange problem with the same lagrangian and ϕ 0 and the endpoint x ( t 2 ) = x 2 fixed. On the other hand, for a vanishing weight p (i.e. p 0 we have that x ( t ) x 1 tends to a constant value throughout the trajectory. Indeed a constant trajectory is the global minimum 5 for the functional F [ x ] = t 1 t 2 1 2 2 d t where only the initial point is fixed. The extremal trajectories for a finite p are straight lines in the interval [ 0 , t 2 ] that reach the optimal trade-off between minimizing the value of the derivative and reaching the target. A simple solution with t 2 = 1 and x 1 = 1 , x 2 = 2 is shown in Figure 3.7 . In this simple case we can also compute analytically the optimal value of the cost functional as a function of the weight p . Since = p 1 + p and x ( t 2 ) = 1 + p 1 + p , we have:

F [ x ] = 0 1 1 2 ( p 1 + p ) 2 d t + 1 2 p ( 1 + p 1 + p 2 ) 2 = p 2 ( 1 + p )

pict

Figure 3.7:: Extremal trajectories for increasing weight p . Limiting case p and p 0 shown in green and red respectively

3.4 Piecewise 𝒞 1 extremal functions

In all the problems examined so far, the functions defining the class for optimization were required to be continuously differentiable that is x 𝒞 1 [ t 1 , t 2 ] n x . Yet, it is natural to wonder whether cornered trajectories, i.e., trajectories represented by piecewise continuously differentiable functions, might not yield improved results. Besides improvement, it is also natural to wonder whether those problems of the calculus of variations which do not have extremals in the class of 𝒞 1 functions actually have extremals in the larger class of piecewise 𝒞 1 functions.

Definition 3.10: Piecewise 𝒞 1 functions

A real-valued function x ^ 𝒞 [ a , b ] is said to be piecewise 𝒞 1 , denoted x ^ 𝒞 ^ 1 [ a , b ] , if there is a finite (irreducible) partition a = c 0 < c 1 < < c N + 1 = b such that x ^ may be regarded as a function in 𝒞 1 [ c k , c k + 1 ] for each k = 0 , 1 , , N . When present, the interior points c 1 , , c N are called corner points of x ^ .

PIC

Figure 3.8:: Illustration of a piecewise continuously differentiable function x ^ 𝒞 ^ 1 [ a , b ] (thick red line), and its derivative x ^ ˙ (dash-dotted red line); without corners, x ^ may resemble the continuously differentiable function x 𝒞 1 [ a , b ] (dashed blue curve).

Some remarks are in order. Observe first that, when there are no corners, then x ^ 𝒞 1 [ a , b ] . Further, for any x ^ 𝒞 ^ 1 [ a , b ] , x ^ ˙ is defined and continuous on [ a , b ] except at its corner points c 1 , , c N where it has distinct limiting values x ^ ˙ ( c k ± ) ; such discontinuities are said to be simple, and x ^ ˙ is said to be piecewise continuous on [ a , b ] denoted x ^ ˙ 𝒞 ^ [ a , b ] . Figure 3.8 illustrates the effect of the discontinuities of x ^ ˙ in producing corners on the graph of x ^ . Without these corners, x ^ might resemble the 𝒞 1 function x whose graph is presented for comparison. In particular, each piecewise 𝒞 1 function is ”almost” 𝒞 1 , in the sense that it is only necessary to round out the corners to produce the graph of a 𝒞 1 function. These considerations are formalized by the following Lemma:

Lemma 3.4: Smoothing of Piecewise 𝒞 1 Functions

Let x ^ 𝒞 ^ 1 [ a , b ] . Then, for each δ > 0 , there exists x 𝒞 1 [ a , b ] such that x x ^ except in a neighborhood δ ( c k ) of each corner point of x ^ . Moreover, x x ^  δ where  is a constant determined by x ^ .

Likewise, we shall consider the class 𝒞 ^ 1 [ a , b ] n x of n x -dimensional vector valued analogue of 𝒞 ^ 1 [ a , b ] , consisting of those functions x ^ 𝒞 ^ 1 [ a , b ] n x with components x ^ j 𝒞 ^ 1 [ a , b ] , j = 1 , , n x . The corners of such x ^ are by definition those of any one of its components x ^ j . Note that the above lemma can be applied to each component of a given x ^ , and shows that x ^ can be approximated by a x 𝒞 1 [ a , b ] n x which agrees with it except in prescribed neighborhoods of its corner points. Both real valued and real vector valued classes of piecewise 𝒞 1 functions form linear spaces of which the subsets of 𝒞 1 functions are subspaces. Indeed, it is obvious that the constant multiple of one of these functions is another of the same kind, and the sum of two such functions exhibits the piecewise continuous differentiability with respect to a suitable partition of the underlying interval [ a , b ] . Since 𝒞 ^ 1 [ a , b ] 𝒞 [ a , b ] , we have:

x : = max a t b | x ( t ) |

defines a norm on 𝒞 ^ 1 [ a , b ] . Moreover,

x 1 , : = max a t b | x ( t ) | + sup t k = 0 N ( c k , c k + 1 ) | ( t ) |

can be shown to be another norm on 𝒞 ^ 1 [ a , b ] , with a = c 0 < c 1 < < c N < c N + 1 = b being a suitable partition for x ^ . (The space of vector valued piecewise 𝒞 1 functions 𝒞 ^ 1 [ a , b ] n x  can also be endowed with the norms   and  1 , ) . By analogy to the linear space of 𝒞 1 functions, the maximum norms and 1 , are referred to as the strong norm and the weak norm, respectively; the functions which are locally extremal with respect to the former [latter] norm are said to be strong [weak] extremal functions.

3.4.1 The Weierstrass-Erdmann Corner Conditions

A natural question that arises when considering the class of 𝒞 ^ 1 functions is whether a (local) extremal point for a functional in the class of 𝒞 1 functions also gives a (local) extremal point for this functional in the larger class of 𝒞 ^ 1 functions. We state the following Theorem:

Theorem 3.8: 𝒞 ^ 1 Extremals vs. 𝒞 1 Extremals

If x gives a [local] extremal point for the functional:

F [ x ] : = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t

on 𝒟 : = { x 𝒞 1 [ t 1 , t 2 ] n x : x ( t 1 ) = x 1 , x ( t 2 ) = x 2 } with
f 𝒞 ( [ t 1 , t 2 ] × 2 n x ) then x also gives a [local] extremal point for F
on 𝒟 ^ : = { x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x : x ^ ( t 1 ) = x 1 , x ^ ( t 2 ) = x 2 } with respect to the same or 1 , norm.

On the other hand, a functional F may not have 𝒞 1 extremals, but be extremized by a 𝒞 ^ 1 function. We shall first seek for weak (local) extremals x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x , i.e., extremal trajectories with respect to some weak neighborhood δ 1 , ( x ^ ) .

  • Observe that x ^ δ 1 , ( x ^ ) if and only if x ^ = x ^ + α ξ ^ for ξ ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x and a sufficiently small α . In characterizing (weak) local extremals for the functional
    F [ x ^ ] : = t 1 t 2 f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t

    on 𝒟 ^ : = { x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x : x ^ ( t 1 ) = x 1 , x ^ ( t 2 ) = x 2 } , where f and its partials f x , f x ˙ are continuous on [ t 1 , t 2 ] × 2 n x , one can therefore duplicate the analysis of the previous section. This is done by splitting the integral into a finite sum of integrals with continuously differentiable integrands, then differentiating each under the integral sign. Overall, it can be shown that a (weak, local) extremal x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x must be stationary in intervals excluding corner points, i.e. the Euler equation is satisfied on [ t 1 , t 2 ] except at corner points c 1 , , c N of x ^ .

  • Likewise, both Legendre second-order necessary conditions and convexity sufficient conditions can be shown to hold on intervals excluding corners points of a 𝒞 ^ 1 extremal.
  • Finally, transversality conditions corresponding to the various free end-point problems remain the same. To see this most easily, e.g., in the case where freedom is permitted only at the right end-point, suppose that x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x gives a local extremal for 𝒟 ^ , and let c N be the right-most corner point of x ^ . Then, restricting comparison to those competing x ^ having their right-most corner point at c N and satisfying x ^ ( c N ) = x ^ ( c N ) , it is seen that the corresponding directions ( ξ ^ , τ ) must utilize the end-point freedom exactly as for 𝒞 1 functions. Thus, resulting in identical boundary conditions.

Besides necessary conditions of optimality on intervals excluding corner points c 1 , , c N of a local extremal x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x , the discontinuities of x ^ ˙ which are permitted at each c k are restricted. These are the so-called first Weierstrass-Erdmann corner conditions.

Theorem 3.9: First Weierstrass-Erdmann Corner Condition

Let x ^ ( t ) be a (weak) local extremal of the problem to minimize the functional

F [ x ^ ] : = t 1 t 2 f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t

on 𝒟 ^ : = { x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x : x ^ ( t 1 ) = x 1 , x ^ ( t 2 ) = x 2 } , where f and its partials f x , f x ˙ are continuous on [ t 1 , t 2 ] × 2 n x . Then, at every (possible) corner point c [ t 1 , t 2 ] of x ^ , we have:

f ( c , x ( c ) , x ˙ ( c ) x ˙ = f ( c , x ( c ) , x ˙ ( c + ) x ˙

where x ^ ˙ ( c ) and x ^ ˙ ( c + ) denote the left and right time derivative of x ^ at c respectively.

Proof. Integrating both sides of Euler equation between t and t 1 for each component i = 1 , , n x gives:

t 1 t d d t f i d t = t 1 t f x d t f i = t 1 t f x i d t + C i ,

therefore the function g ( t ) : = f ( t , x ^ ( t ) , x ^ ˙ ( t ) i is continuous at each t ( t 1 , t 2 ) even though x ^ ˙ ( t ) may be discontinuous at that point. That is g ( c ) = g ( c + ) . Moreover, f i being continuous in its 1 + 2 n x arguments, x ^ ( t ) being continuous at c , and x ^ ˙ ( t ) having finite limits x ^ ˙ ( c ± ) at c we get:

f ( c , x ^ ( c ) , x ^ ˙ ( c ) ) i = f ( c , x ^ ( c ) , x ^ ˙ ( c + ) ) i

for each i = 1 , , n x

Remark 3.16

The first Weierstrass-Erdmann condition of Theorem 3.9 shows that the discontinuities of x ^ ˙ which are permitted at corner points of a local extremal trajectory x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x are those which preserve the continuity of f x ˙ . Likewise, it can be shown that the continuity of the Hamiltonian H : = f f x ˙ x ˙ must be preserved at corner points of x ^ that is:

H ( c , x ^ ( c ) , x ^ ˙ ( c ) ) = H ( c , x ^ ( c ) , x ^ ˙ ( c + ) )

which yields the so-called second Weierstrass-Erdmann corner condition.

Example 3.12. Consider the problem to minimize the functional:

F [ x ] = 1 1 x 2 ( t ) ( 1 ( t ) ) 2 d t s.t. x 𝒟 ,

where 𝒟 : = { x 𝒞 1 [ 1 , 1 ] : x ( 1 ) = 0 , x ( 1 ) = 1 } . The lagrangian f = x 2 ( t ) ( 1 ( t ) 2 ) is independent of the independent variable t , we thus have the constancy of the Hamiltonian on a stationary trajectory. Thus:

H = f f = x 2 ( t ) ( 1 ( t ) ) 2 [ 2 x ( t ) 2 ( ( t ) 1 ) ] ( t ) = c t [ 1 , 1 ]

for some constant c . Upon semplification, we get:

x ( t ) 2 ( 1 ( t ) 2 ) = c t [ 1 , 1 ] ,

in order to solve this differential equation we make the substitution u ( t ) : = x ( t ) 2 and thence u ˙ ( t ) : = 2 x ( t ) ( t ) , we get the somewhat simpler equation u ˙ ( t ) 2 = 4 ( u ( t ) c ) that can be solved by separation of variable and has general solution:

u ( t ) : = ( t + k ) 2 + c

where k is a constant of integration. In turn, substituing back x 2 ( t ) = u ( t ) we conclude that a stationary point x ¯ must be of the form:

x ¯ ( t ) 2 = ( t + k ) 2 + c .

In particular, the boundary conditions x ( 1 ) = 0 and x ( 1 ) = 1 produce constants c = ( 3 4 ) 2 and k = 1 4 . However, the resulting stationary function:

x ¯ ( t ) = ( t + 1 4 ) 2 ( 3 4 ) 2 = ( t + 1 ) ( t 1 2 )

is defined only for t 1 2 or t 1 . Thus, there is no stationary function for the Lagrangian f in 𝒟 . Next, we turn to the problem of minimizing F in the larger set 𝒟 ^ : = { x ^ 𝒞 ^ 1 [ 1 , 1 ] : x ^ ( 1 ) = 0 , x ^ ( 1 ) = 1 . Suppose that x ^ is a local minimizer for F on 𝒟 ^ . Then, by the Weierstrass-Erdmann condition, we must have:

f ( c , x ( c ) , ( c ) = f ( c , x ( c ) , ( c + )
2 x ^ ( c ) [ 1 x ^ ˙ ( c ) ] = 2 x ^ ( c ) [ 1 x ^ ˙ ( c + ) ]

which gives:

x ^ ( c ) [ x ^ ˙ ( c + ) x ^ ˙ ( c ) ] = 0 ,

by definition of corner points, we must have x ^ ˙ ( c + ) x ^ ˙ ( c ) , hence corner points are only allowed at those c ( 1 , 1 ) such that x ^ ( c ) = 0 . Observe that since f = x 2 ( t ) ( 1 ( t ) ) 2 > 0 x 0 , 1 and thus the functional is bounded below by 0 . This means that if we are able to find a x ¯ 𝒟 ^ function that satisfies the necessary conditions and achieve F [ x ¯ ] = 0 then x ¯ = x is the global optimum of the problem. From the boundary condition we have that x ¯ ( 1 ) = 0 and x ¯ ( 1 ) = 1 and from the Weierstrass-Erdmann condition we have that we can only have corner points at time instants such that x ¯ ( c ) = 0 . We can then construct a function x ¯ that is 0 from 1 to 0 where it has a corner point and grows linearly with constant derivative equal to 1 up to 1 so that x ¯ ( 1 ) = 1 . More precisely:

x ¯ ( t ) = { 0 , 1 t 0 t , 0 < t 1

such a function achieves the global minimum F [ x ¯ ] = 0 and satisfies the necessary conditions (i.e. it has a corner point at 0 where x ¯ ( 0 ) = 0 ) and thence it is the unique global minimum point for F on 𝒟 ^ . The global optimum is plotted in Figure 3.9 .

pict

Figure 3.9:: Globally minimizing trajectory, note the corner point c = 0
.

Corollary 3.1: Absence of corner points

Consider the problem to minimize the functional:

F [ x ^ ] : = t 1 t 2 f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t

on 𝒟 ^ : = { x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x : x ^ ( t 1 ) = x 1 , x ^ ( t 2 ) = x 2 } . If f x ˙ ( t , y , z ) is a strictly monotone function of z n x (or, equivalently, f ( t , y , z ) is a convex function in z on n x ), for each ( t , y ) [ t 1 , t 2 ] × R n x , then an extremal solution x ^ ( t ) cannot have corner points.

Proof. By the first Weierstrass-Erdmann corner condition, at a corner point holds:

f ( c , x ( c ) , x ˙ ( c ) x ˙ = f ( c , x ( c ) , x ˙ ( c + ) x ˙ .

Let’s define vector-valued function k ( z ) : = f ( c , x ( c ) , z ) x ˙ . By hypothesis k is strictly monotone in z and thus it cannot assume twice the same value. The Weierstrass-Erdmann condition can be written in terms of k as:

k ( x ^ ˙ ( c ) ) = k ( x ^ ˙ ( c + ) ) k ( z 1 ) = k ( z 2 )

but by definition of corner point z 1 z 2 thus contradicting the strictly monotonicity of k . □

Example 3.13 (Minimum Path Problems). Consider again the problem in Example 3.8 that is to minimize the distance between two fixed points, namely A = ( x 1 , y 1 ) and B = ( x 2 , y 2 ) in the ( x , y ) -plane. We have shown that extremal trajectories for this problem correspond to straight lines. But could we have extremal trajectories with corner points? The answer is no, the lagrangian is f = 1 + 2 and f = 1 + 2 is a strictly monotone function in hence we can apply Corollary 3.1 and conclude that extremal trajectories for this problem cannot have corner points.

3.4.2 Weierstrass’ Necessary Conditions: Strong Minima

The Gâteaux derivatives of a functional are obtained by comparing its value at a point x with those at points x + α ξ in a weak norm neighborhood. In contrast to these (weak) variations, we now consider a new type of (strong) variations whose smallness does not imply that of their derivatives. In the scalar case 6 , we consider variations Ŵ 𝒞 ^ 1 [ t 1 , t 2 ] defined as:

Ŵ ( t ) : = { v ( t τ + δ ) if τ δ t τ v ( δ ( t τ ) + τ ) if τ t < τ + δ 0 otherwise

Where τ ( t 1 , t 2 ) and v and δ are positive real coefficients such that τ δ > t 1 and τ + δ < t 2 .

PIC

Figure 3.10:: Strong variation Ŵ ( t ) and its time derivative Ŵ ˙ ( t )

Note that strong variations of this kind depend on three parameters v , δ , τ . Informally speaking τ is the point at which the perturbation is centered while δ determines the extension in time of the perturbation. Note that the conditions τ δ > t 1 and τ + δ < t 2 constrain the variation to lie within the open interval ( t 1 , t 2 ) . The parameter v modulates the magnitude of the variation and of its derivative. We are now ready to state a set of necessary conditions for a strong local minimum, whose proof is based on the foregoing class of variations.

Theorem 3.10: Weierstrass’ Necessary Condition

Consider the problem to minimize the functional:

F [ x ^ ] : = t 1 t 2 f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t

on 𝒟 ^ : = { x ^ 𝒞 ^ 1 [ t 1 , t 2 ] n x : x ^ ( t 1 ) = x 1 , x ^ ( t 2 ) = x 2 } . Suppose x ^ ( t ) , t 1 t t 2 , gives a strong (local) minimum for F on 𝒟 ^ . Then,

( t , x ^ , x ^ ˙ , v ) : = f ( t , x ^ , x ^ ˙ + v ) f ( t , x ^ , x ^ ˙ ) f ( t , x ^ , x ^ ˙ ) x ˙ v 0

at every t [ t 1 , t 2 ] and for each v n x . ( is referred to as the excess function of Weierstrass ).

Proof. For the sake of clarity, we shall present and prove this condition for scalar functions x ^ 𝒞 ^ [ t 1 , t 2 ] only. Let x ^ δ ( t ) : = x ^ ( t ) + Ŵ ( t ) . Note that both Ŵ and x ^ being 𝒞 ^ 1 functions, so is x ^ δ . These smoothness conditions are sufficient to calculate F [ x ^ δ ] , as well as its derivative with respect to δ at δ = 0 . Note that x ^ δ and x ^ differ only in the interval [ τ δ , τ + δ ] . Hence, by the definition of Ŵ , we have:

F [ x ^ δ ] F [ x ^ ] = τ δ τ + δ ( f ( t , x ^ δ ( t ) , x ^ ˙ δ ( t ) ) f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) ) d t = τ τ + δ f ( t , x ^ δ ( t ) , x ^ ˙ δ ( t ) ) d t + τ δ τ f ( t , x ^ δ ( t ) , x ^ ˙ δ ( t ) ) d t + τ δ τ + δ f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t ,

note that we have splitted the integral in the points of discontinuities of the derivative of Ŵ ( t ) . The differential quotient is:

F [ x ^ δ ] F [ x ^ ] δ = 1 δ τ δ τ ( f ( t , x ^ ( t ) + v ( t τ + δ ) , x ^ ˙ ( t ) + v ) f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) ) d t + 1 δ { τ τ + δ f ( t , x ^ ( t ) + v ( δ ( t τ ) + τ ) , x ^ ˙ ( t ) v δ ) d t + τ τ + δ f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t } = I 1 δ + I 2 δ .
I 1 0 = lim δ 0 I 1 δ = lim δ 0 1 δ { τ τ δ ( f ( t , x ^ ( t ) + v ( t τ + δ ) , x ^ ˙ ( t ) + v ) d t + τ τ δ f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) ) d t } = = f ( τ , x ^ ( τ ) , x ^ ˙ ( τ ) + v ) f ( τ , x ^ ( τ ) , x ^ ˙ ( τ ) ) ,

where we have applied Theorem 3.6. In order to analyse the second term we define g ( t ) : = v ( ( t τ ) + δ ) , its time-derivative is ġ ( t ) = v and thus we have:

I 2 0 = lim δ 0 I 2 δ = lim δ 0 1 δ { τ τ + δ f ( t , x ^ ( t ) + δ g ( t ) , x ^ ˙ ( t ) + δ ġ ) d t + τ τ + δ f ( t , x ^ ( t ) , x ^ ˙ ( t ) ) d t } ,

using the first-order Taylor series expansion in δ under the integral sign we have:

f ( t , x ^ ( t ) + δ g ( t ) , x ^ ˙ ( t ) + δ ġ ) = = f [ x ^ ] + f [ x ^ ] x g ( t ) δ + f [ x ^ ] ġ ( t ) δ + o ( δ )

where the arguments have been compressed for notational simplicity. Therefore we have:

lim δ 0 I 2 δ = lim δ 0 1 δ τ τ + δ ( f [ x ^ ] x g ( t ) + f [ x ^ ] ġ ( t ) + o ( δ ) ) d t ,

upon integration by parts of the term involving ġ ( t ) we obtain:

lim δ 0 I 2 δ = lim δ 0 { 1 δ τ τ + δ ( f [ x ^ ] x d d t f [ x ^ ] ) g ( t ) d t + + 1 δ ( f [ x ^ ] g ) | τ τ + δ + o ( δ ) δ } ,

note that the term f [ x ^ ] x d d t f [ x ^ ] = 0 since Euler equations are necessary condition for optimality. Then by definition of small o we have lim δ 0 o ( δ ) δ = 0 , finally the last term reduces to:

1 δ ( f [ x ^ ] g ) | τ τ + δ = f [ x ^ ( τ ) ] v ,

since g ( τ ) = 0 and g ( τ + δ ) = v δ . Since x ^ is a local strong minimizer we have F [ x ^ δ ] F [ x ^ ] , for all sufficiently small δ , and thence also in the limit

0 lim δ 0 F [ x ^ δ ] F [ x ^ ] δ = I 1 0 + I 2 0 ,

and thus:

f ( τ , x ^ ( τ ) , x ^ ˙ ( τ ) + v ) f ( τ , x ^ ( τ ) , x ^ ˙ ( τ ) ) f ( τ , x ^ ( τ ) , x ^ ˙ ( τ ) v 0

for every τ and v . □

Example 3.14 (Minimum path problem II). Consider again the problem in Example 3.8 that is to minimize the distance between two fixed points, namely A = ( x 1 , y 1 ) and B = ( x 2 , y 2 ) in the ( x , y ) -plane. We have shown that extremal trajectories for this problem correspond to straight lines joining A and B , that is:

y ( x ) = C 1 x + C 2

where C 1 = y 2 y 1 x 2 x 1 and C 2 = y 1 . We now ask the question whether y ( x ) is a strong minimum for that problem? The Weierstress excess function is:

( x , y , , v ) = 1 + ( C 1 + v ) 2 1 + C 1 2 C 1 v 1 + ( C 1 ) 2 ,

note that in this simple case it can be regarded as a function of C 1 and v only. The excess function is plotted as a function of v for different values of C 1 in Figure 3.11 . It is easily checked that it is always nonnegative therefore y is also a strong local minimum for the minimum path problem. In fact, we note that the function g ( z ) = 1 + z 2 is convex 7 . The Weierstrass condition is equivalent to:

g ( v + C 1 ) g ( C 1 ) d g d z | z = C 1 v 0

That holds true for every C 1 , v being precisely the first-order characterization of convexity of g .

pict

Figure 3.11:: Weierstress excess function for the minimum path problem as a function of v for different coefficients C 1 .

The following corollary indicates that the Weierstrass condition is also useful to detect strong (local) minima in the class of 𝒞 1 functions.

Corollary 3.2: Weierstrass’ Necessary Condition

Consider the problem to minimize the functional:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t s . t . x 𝒟

where the functional space is defined as 𝒟 = { x 𝒞 1 [ t 1 , t 2 ] such that x ( t 1 ) = x 1 , x ( t 2 ) = x 2 } and f : × n x × n x a continuously differentiable function. Suppose that x 𝒟 gives a (local) minimum for F on 𝒟 .
Then :

( t , x ^ , x ^ ˙ , v ) : = f ( t , x ^ , x ^ ˙ + v ) f ( t , x ^ , x ^ ˙ ) f ( t , x ^ , x ^ ˙ ) x ˙ v 0

at every t [ t 1 , t 2 ] and for each v n x .

Proof. By Theorem 3.8 we have that x is a local strong minimizer on 𝒟 ^ as well, therefore Theorem 3.10 holds. □

Remark 3.17: Weierstrass’ Condition and Convexity

It is readily seen that the Weierstrass condition of Theorem 3.10 is satisfied automatically when the Lagrangian function f ( t , y , z ) is partially convex (and continuously differentiable) in z n x , for each ( t , y ) [ t 1 , t 2 ] × n x .

Remark 3.18: Weierstrass’ Condition and Pontryagin’s Maximum Principle

Interestingly enough, the Weierstrass’ condition can be rewritten as:

f ( t , x ( t ) , x ˙ ( t ) + v ) f ( t , x ( t ) , x ˙ ( t ) + v ) x ˙ ( x ˙ + v ) f ( t , x ( t ) , x ˙ ( t ) ) f ( t , x ( t ) , x ˙ ( t ) ) x ˙ x ˙

which given the definition of Hamiltonian gives:

H ( t , x ( t ) , x ˙ ( t ) + v ) H ( t , x ( t ) , x ˙ ( t ) )

for each t [ t 1 , t 2 ] and for each v n x . This necessary condition prefigures Pontryagin’s Maximum Principle in optimal control theory.

3.5 Problems with equality and inequality constraints

We now turn our attention to problems of the calculus of variations in the presence of constraints. Similar to finite-dimensional optimization problems, it is more convenient to use Lagrange multipliers in order to derive the necessary conditions of optimality associated with such problems; these considerations are discussed in the following for equality constraints. The method of Lagrange multipliers is then used in to obtain necessary conditions of optimality for problems subject to (equality) terminal constraints and isoperimetric constraints, respectively. The Lagrange multiplier methodology is then extended to inequality constrained problems.

We start by considering a general class of equality constraints where the function space 𝒟 is defined by level-set curves of one ore more functionals G 1 , , G n g .

Definition 3.11: Equality constrained calculus of variations problem

Consider the problem to minimize:

min x ( t ) F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t s . t . x 𝒟 𝒳

where 𝒟 : = { x 𝒳 such that G i [ x ] = K i i = 1 , , n g } .

The functionals G i are defined in 𝒳 as well.

Remark 3.19

Note that also the basic problem of Calculus of Variations we have addressed so far falls into this broader category. Consider the functionals G 1 [ x ] = x ( t 1 ) and G 2 [ x ] = x ( t 2 ) and recall that for the basic problem we had 𝒟 : = { x 𝒞 1 [ t 1 , t 2 ] n x such that x ( t 1 ) = x 1 , x ( t 2 ) = x 2 } . We can define the same 𝒟 as 𝒟 : = { x 𝒞 1 [ t 1 , t 2 ] n x such that G 1 [ x ] = x 1 and G 2 [ x ] = x 2 } .

Remark 3.20: Common types of constraint functionals

This kind of problems allow to treat more general problems such as endpoint constrained problems and isoperimetric problems. In the endpoint constrained case the constraint functionals G i take the form G i [ x ] = ψ i ( t 2 , x ( t 2 ) ) for i = 1 , , n x . The constraint reduces to G i [ x ] = 0 i = 1 , , n x , in this way the final state is forced to satisfy ψ ( t 2 , x ( t 2 ) ) = 0 . Isoperimetric problems are a class of problems where the constraint functionals are defined as G [ x ] = t 1 t 2 g ( t , x , x ˙ ) d t and the constraint is expressed as the level-set G [ x ] = L where L . These kind of constraints are found in minimum volume [area] problems with fixed area [perimeter] as for example the catenary or Dido’s problem that we will solve in the following sections.

The next theorem that we give without proof is instrumental to prove a fundamental lemma on the existence of minimizers of such equality constrained problems.

Theorem 3.11: Inverse Function Theorem

Let x 0 n x and η > 0 . If a function Φ : η ( x 0 ) n x has continuous first partial derivatives in each component with nonvanishing Jacobian determinant at x 0 , then Φ provides a continuously invertible mapping between η ( x 0 ) and a region containing a full neighborhood of Φ ( x 0 ) .

The following Lemma gives conditions under which a point x ¯ in a normed linear space ( 𝒳 ; ) cannot be a (local) extremal of a functional F , when constrained to the level set of another functional G .

Lemma 3.5

Let F and G be functionals defined in a neighborhood of x ¯ in a normed linear space ( 𝒳 ; ) , and let G [ x ] = K be the equality constraint defining 𝒟 . Suppose that there exist fixed directions ξ 1 , ξ 2 such that the Gâteaux derivatives of F and G satisfy the Jacobian condition:

| δ F x ¯ [ ξ 1 ] δ F x ¯ [ ξ 2 ] δ G x ¯ [ ξ 1 ] δ G x ¯ [ ξ 2 ] | 0

and are continuous in a neighborhood of x ¯ (in each direction ξ 1 , ξ 2 ). Then, x ¯ cannot be a local extremal for F when constrained to 𝒟 : = { x 𝒳 such that G [ x ] = K } .

Proof. Consider the auxiliary functions j ( η 1 , η 2 ) : = F [ x ¯ + η 1 ξ 1 + η 2 ξ 2 ] and g ( η 1 , η 2 ) : = G [ x ¯ + η 1 ξ 1 + η 2 ξ 2 ] , we have j ( 0 , 0 ) = F [ x ¯ ] and g ( 0 , 0 ) = G [ x ¯ ] = K and by definition of Gâteaux derivative δ F x ¯ [ ξ 1 ] = j η 1 | 0 , 0 = j η 1 ( 0 , 0 ) , δ F x ¯ [ ξ 2 ] = j η 2 | 0 , 0 = j η 2 ( 0 , 0 ) , δ G x ¯ [ ξ 1 ] = g η 1 | 0 , 0 = g η 1 ( 0 , 0 ) and δ G x ¯ [ ξ 2 ] = g η 2 | 0 , 0 = g η 2 ( 0 , 0 ) . Let us also define the function Φ : 2 2 as:

Φ = [ j ( η 1 , η 2 ) g ( η 1 , η 2 ) ] ,

the jacobian determinant of Φ at ( 0 , 0 ) is:

Det ( J Φ ( 0 , 0 ) ) = | j η 1 ( 0 , 0 ) j η 2 ( 0 , 0 ) g η 1 ( 0 , 0 ) g η 2 ( 0 , 0 ) | = | δ F x ¯ [ ξ 1 ] δ F x ¯ [ ξ 2 ] δ G x ¯ [ ξ 1 ] δ G x ¯ [ ξ 2 ] | ,

by hypothesis we have Det ( J ( 0 , 0 ) ) 0 and thence Theorem 3.11 is applicable. Therefore the pre-image points of the neighborhood of Φ ( 0 , 0 ) = [ j ( 0 , 0 ) , g ( 0 , 0 ) ] = [ F [ x ¯ ] , K ] maps a full neighbourhood of the origin ( 0 , 0 ) in the η 1 - η 2 plane as shown in Figure 3.12. That means that we can find ( η 1 × , η 2 × ) and ( η 1 o , η 2 o ) such that:

j ( η 1 o , η 2 o ) < j ( 0 , 0 ) < j ( η 1 × , η 2 × ) g ( η 1 o , η 2 o ) = g ( 0 , 0 ) = g ( η 1 × , η 2 × ) = K ,

by definition of the functions g and j we have:

F [ x ¯ + η 1 o ξ 1 + η 2 o ξ 2 ] < F [ x ¯ ] < F [ x ¯ + η 1 x ξ 1 + η 2 x ξ 2 ] G [ x ¯ + η 1 o ξ 1 + η 2 o ξ 2 ] < G [ x ¯ ] < G [ x ¯ + η 1 x ξ 1 + η 2 x ξ 2 ] = K ,

that means that x ~ = x ¯ + η 1 o ξ 1 + η 2 o ξ 2 is admissible (i.e. satisfy the constraint G [ x ~ ] = K ) and reduces the cost, that is F [ x ~ ] < F [ x ¯ ] hence x ¯ cannot be a local extremal trajectory. □

pict

Figure 3.12:: Due to the inverse function theorem there exist a full neihborhood of (j(0,0),g(0,0)) hence it is possible to find trajectories with lower cost

With this preparation, it is easy to derive necessary conditions for a local extremal in the presence of equality or inequality constraints, and in particular the existence of the Lagrange multipliers.

Theorem 3.12: Existence of the Lagrange Multipliers

Let F and G be functionals defined in a neighborhood of x in a normed linear space ( 𝒳 , ) , and having continuous Gâteaux derivatives in this neighborhood. Let also K = G ( x ) and suppose that x is a (local) extremal for F constrained to Γ ( K ) : = { x 𝒳 : G ( x ) = K } . Suppose further that δ G x [ ξ ¯ ] 0 for some direction ξ ¯ 𝒳 . Then, there exists a scalar λ such that:

δ F x [ ξ ] + λ δ G x [ ξ ] = 0 ξ 𝒳 a a Note that the variations need not to be admissible for the space  Γ ( K ) . .

Proof. Since x is a (local) extremal for F constrained to Γ ( K ) by Lemma 3.5 we must have that the determinant:

| δ F x [ ξ ¯ ] δ F x [ ξ ] δ G x [ ξ ¯ ] δ G x [ ξ ] | = 0

for any ξ 𝒳 Hence, by defining:

λ : = δ F x [ ξ ¯ ] δ G x [ ξ ¯ ]

it follows that δ F x [ ξ ] + λ δ G x [ ξ ] = 0 for each ξ 𝒳 . □

Remark 3.21

As in the finite-dimensional case, the parameter λ in Theorem 3.12 is called a Lagrange multiplier. Using the terminology of directional derivatives appropriate to n x , the Lagrange condition δ F x [ ξ ] = λ δ G x [ ξ ] says simply that the directional derivatives of F are proportional to those of G at x . Thus, in general, Lagrange’s condition means that the level sets of both F and G at x share the same tangent hyperplane at x i.e., they meet tangentially. Note also that the Lagrange’s condition can also be written in the form

δ ( F + λ G ) x [ ξ ] = 0 ξ 𝒳

This is due to the linearity of the Gâteaux derivative that is, given two scalars a 1 , a 2 and two functionals A , B from 𝒳 to we have that δ ( a 1 A + a 2 B ) x [ ξ ] = a 1 δ A x [ ξ ] + a 2 δ A x [ ξ ] as shown in Remark 3.7 which suggests, as we will see in the following, consideration of the Lagrangian functional L : = F + λ G .

It is possible, ableit technical, to extend the method of Lagrange multipliers to problems involving any finite number of constraint functionals:

Theorem 3.13: (Existence of the Lagrange Multipliers (Multiple Constraints)

Let F and G i , i = 1 , n g be functionals defined in a neighborhood of x in a normed linear space ( 𝒳 , ) and having continuous Gâteaux derivatives in this neighborhood. Let also K i = G i [ x ] , i = 1 , , n g and suppose that x is a (local) extremal for F constrained to Γ ( K ) : = { x 𝒳 : G i [ x ] = K i , i = 1 , , n g } . Suppose further that:

| δ G 1 , x [ ξ ¯ 1 ] δ G 1 , x [ ξ ¯ n g ] δ G n g , x [ ξ ¯ 1 ] δ G n g , x [ ξ ¯ n g ] | 0

for n g (independent) directions ξ ¯ 1 , , ξ ¯ n g 𝒳 . Then, there exists a constant vector λ n g such that:

δ F x [ ξ ] + i = 1 n g λ i δ G i , x [ ξ ] = 0 ξ 𝒳

Remark 3.22: Link to Nonlinear Optimization

The previous theorem is the generalization of Theorem 2.13 of Chapter 2 to optimization problems in normed linear spaces. Note, in particular, that the requirement that x to be a regular point for the Lagrange multipliers to exist is generalized by a non-singularity condition in terms of the Gâteaux derivatives of the constraint functionals G 1 , , G n g . Yet, this condition is in general not sufficient to guarantee uniqueness of the Lagrange multipliers.

Remark 3.23: Hybrid Method of Admissible Directions and Lagrange Multipliers

If x 𝒟 with 𝒟 a subset of a normed linear space ( 𝒳 , ) , and the 𝒟 -admissible directions form a linear subspace of 𝒳 (i.e., ξ 1 , ξ 2 𝒟 η 1 ξ 1 + η 2 ξ 2 𝒟 for every scalars η 1 , η 2 ), then the conclusions of Theorem 3.12 remain valid when further restricting the continuity requirement for F to 𝒟 and considering 𝒟 -admissible directions only. Said differently, Theorem 3.12 can be applied to determine (local) extremals to the functional F | 𝒟 constrained to Γ ( K ) . This extension leads to a more efficient but admittedly hybrid approach to certain problems involving multiple constraints. Those constraints on which determine a domain 𝒟 having a linear subspace of 𝒟 -admissible directions, may be taken into account by simply restricting the set of admissible directions when applying the method of Lagrangian multipliers to the remaining constraints.

Necessary conditions of optimality for problems with end-point constraints and isoperimetric constraints shall be obtained with this hybrid approach in the next subsections.

3.5.1 Problems with end-point constraints

So far, we have only considered those problems with either free or fixed end-time t 1 , t 2 and end-points x ( t 1 ) , x ( t 2 ) . In this subsection, we shall consider problems having end-point constraints of the form ϕ ( t 2 , x ( t 2 ) ) = 0 with t 2 being specified or not. In the case t 2 is free, t 2 shall be considered as an additional variable in the optimization problem. Like in 3.3.1 we shall then define the functions x ( t ) by extension on a ”sufficiently” large interval [ t 1 , T ] , and consider the linear space 𝒞 1 [ t 1 , T ] n x × , supplied with the weak norm ( x , t ) 1 , : = x 1 , + | t | . In particular, Theorem 3.13 applies readily by specializing the normed linear space ( X , ) to ( 𝒞 1 [ t 1 , T ] n x × , ( , ) 1 , ) and considering the Gâteaux derivative δ F x , t 2 [ ξ , τ ] at ( x , t 2 ) in the direction ( ξ , τ ) . These considerations yield necessary conditions of optimality for problems with end-point constraints as given in the following:

Theorem 3.14: Transversal Conditions

Consider the problem to minimize the functional

F [ x , t 2 ] : = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t

on 𝒟 : = { ( x , t 2 ) 𝒞 1 [ t 1 , T ] n x × [ t 1 , T ] : x ( t 1 ) = x 1 , ϕ ( t 2 , x ( t 2 ) ) = 0 } with f 𝒞 1 ( [ t 1 , T ] × 2 × n x ) and ϕ 𝒞 1 ( [ t 1 , T ] × n x ) n x . Suppose that ( x , t 2 ) gives a (local) minimum for F on 𝒟 and Rank [ ϕ t ϕ x ] = n x at ( x ( t 2 ) , t 2 ) . Then, x is a solution to the Euler equation of Theorem 3.3 satisfying both the end-point constraints x ( t 1 ) = x 1 and the transversal condition:

[ ( f f x ˙ x ˙ ) d t + f x ˙ d x ] x , t 2 = 0 [ d t d x ] Ker [ ϕ t ϕ x ] x , t 2

in the particular one-dimensional case (i.e. n x = 1 ) the transversal condition reduces to:

[ ϕ x ( f f ) ϕ t f ] x , t 2 = 0 .

Proof. Observe first that by fixing t 2 : = t 2 and varying x in the 𝒟 -admissible direction ξ 𝒞 1 [ t 1 , t 2 ] n x such that ξ ( t 1 ) = ξ ( t 2 ) = 0 we show as in the proof of Theorem 3.3 that x must be a solution to the Euler equation on [ t 1 , t 2 ] . Observe also that the right end-point constraints may be expressed as the zero-level set of the functionals:

G k [ x ] : = ϕ k ( t 2 , x ( t 2 ) ) = 0 k = 1 , , n x ,

then simplifying the terms using the Euler equation as in the proof of Theorem 3.7 we obtain the first variations of the cost and constraint functionals as:

δ F ( x , t 2 ) [ ξ , τ ] = f [ x ( t 2 ) ] x ˙ ξ ( t 2 ) + f [ x ( t 2 ) ] τ δ G k , ( x , t 2 ) [ ξ , τ ] = ϕ k [ x ( t 2 ) ] t τ + ϕ k [ x ( t 2 ) ] x ( ξ ( t 2 ) + x ˙ ( t 2 ) τ )

where the usual compressed notation is used. Based on the differentiability assumptions on f and ϕ it is clear that these Gâteaux derivatives exist and are continuous. Further, since the rank condition Rank [ ϕ t ϕ x ] = n x holds at ( x ( t 2 ) , t 2 ) one can always find n x (independent) directions ( ξ ¯ k , τ ¯ k ) 𝒞 1 [ t 1 , t 2 ] n x × such that the regularity condition:

| δ G 1 , ( x , t 2 ) [ ξ ¯ 1 , τ ¯ 1 ] δ G 1 , ( x , t 2 ) [ ξ ¯ n x , τ ¯ n x ] δ G n x , ( x , t 2 ) [ ξ ¯ 1 , τ ¯ 1 ] δ G n x , ( x , t 2 ) [ ξ ¯ n x , τ ¯ n x ] | 0

is satisfied. Now, consider the linear subspace
Ξ : = { ( ξ , τ ) 𝒞 1 [ t 1 , T ] n x × : ξ ( t 1 ) = 0 } . Since ( x , t 2 ) gives a (local) minimum for F on 𝒟 by Theorem 3.13 and Remark 3.23 there exist a vector λ n x such that:

0 = δ F ( x , t 2 ) [ ξ , τ ] + i = 1 n g λ i δ G i , ( x , t 2 ) [ ξ , τ ] = = f [ x ( t 2 ) ] x ˙ ξ ( t 2 ) + f [ x ( t 2 ) ] τ + + k = 1 n x λ k ( ϕ k [ x ( t 2 ) ] t τ + ϕ k [ x ( t 2 ) ] x ( ξ ( t 2 ) + x ˙ ( t 2 ) τ ) ) .

The above equation can be rewritten using the vectorial form as:

0 = f [ x ( t 2 ) ] x ˙ ξ ( t 2 ) + + ( f [ x ( t 2 ) ] + λ ϕ [ x ( t 2 ) ] t ) τ + λ ϕ [ x ( t 2 ) ] x ( ξ ( t 2 ) + x 2 ( t 2 ) τ )

that must hold for every ( ξ , τ ) Ξ . In particular, we consider variations ξ such that ξ ( t 2 ) = x ( t 2 ) τ so that the second term is simplified and we get:

( f [ x ( t 2 ) ] f [ x ( t 2 ) ] x ˙ x ˙ ( t 2 ) + λ ϕ [ x ( t 2 ) ] t ) τ = 0

that must hold for each admissible τ and thus we get:

λ ϕ [ x ( t 2 ) ] t = f [ x ( t 2 ) ] + f [ x ( t 2 ) ] x ˙ x ˙ ( t 2 ) = b t .

Now considering variations such that τ = 0 while ξ i ( t 2 ) is such that ξ i ( t 2 ) = ( 0 , . . , ξ i ( t 2 ) , . . , 0 ) we get for each i :

( f [ x ( t 2 ) ] i + λ ϕ x i ) ξ i ( t 2 ) = 0

and thus we have the equation:

λ ϕ x i = f [ x ( t 2 ) ] i = b i , x

for each i = 1 , , n x . In vectorial notation we have the following n x dimensional system of equations:

λ [ ϕ t ϕ x ] = [ f [ x ( t 2 ) ] + f [ x ( t 2 ) ] x ˙ x ˙ ( t 2 ) f [ x ( t 2 ) ] x ˙ ]

that following our definitions of b t and b x becomes:

λ [ ϕ t ϕ x ] = [ b t b x ]

that is a linear system of equations expressed as λ A = [ b t b x ] where A = [ ϕ t ϕ x ] . Note that [ ϕ t ϕ x ] is the jacobian J ϕ ( t 2 , x ( t 2 ) ) matrix whose dimensions are n x × n x + 1 and by hypothesis its rank is n x . Therefore for the rank-nullity theorem dim ( Ker ( J ϕ ( t 2 , x ( t 2 ) ) ) ) = 1 , we can therefore pick a n x + 1 dimensional vector d = [ d t d x ] Ker ( J ϕ ( t 2 , x ( t 2 ) ) ) and post-multiply by it the previous equation, noting that [ ϕ t ϕ x ] d = 0 since d Ker ( J ϕ ( t 2 , x ( t 2 ) ) ) , we get:

[ b t b x ] d = 0

note that if d belongs to the kernel of the Jacobian at ( x ( t 2 ) , t 2 ) also d does, then it also holds:

[ b t b x ] d = 0

Substituting the definition of b t and b x we get:

[ ( f f x ˙ x ˙ ) d t + f x ˙ d x ] x , t 2 = 0 [ d t d x ] Ker [ ϕ t ϕ x ] x , t 2

Remark 3.24: one-dimensional case

For the one-dimensional case we have:

[ ( f f ) d t + f d x ] x , t 2 = 0 ( d t d x ) Ker [ ϕ t ϕ x ] x , t 2

note that [ ϕ t ϕ x ] x , t 2 = v reduces to a row vector in 2 then the kernel is simply made of the subspace of vectors in d 2 such that:

v d = 0 d t ϕ t + d x ϕ x = 0 ,

by picking d x = ϕ t and d t = ϕ x the transversality condition reduces to:

[ ( f f ) ϕ x f ϕ t ] x , t 2 = 0

Remark 3.25

Note that ϕ 𝒞 1 ( [ t 1 , T ] × n x ) n x therefore its Jacobian is a matrix n x × ( n x + 1 ) of the form :

J ϕ = ϕ [ t x ] = [ ϕ 1 t ϕ 1 x 1 ϕ 1 x n x ϕ n x t ϕ n x x 1 ϕ n x x n x ] = [ ϕ t ϕ x ]

and it is a matrix-valued function of ( t , x ) that is J ϕ ( t , x ) . Its rank can be at most n x when all of its rows are independent.

Example 3.15 (Minimum Path Problem with Variable End-Point and End-Time). Consider the problem to minimize the distance between a fixed point A = ( x A , y A ) and the point B = ( x B , y B ) { ( x , y ) 2 : y = a x + b } in the ( x , y ) -plane. We want to find the curve y ( x ) x A x x B such that the functional F [ y , x B ] : = x A x B 1 + ( x ) 2 d x is minimized, subject to the constraints y ( x A ) = y A and y ( x B ) = a x B + b . Note that neither x B nor y ( x B ) are explicitly given. Instead they are implicitly given by the scalar end-point constraint ϕ ( x B , y ( x B ) ) = y ( x B ) a x B b = 0 where ϕ : 2 is obviously a 𝒞 1 function. We stress the fact that ϕ ( x , y ) = 0 (i.e. the zero level-set of the function ϕ ) is a straight line in 2 . Therefore, we are looking for the minimum path problem from a fixed point A to a point B on a given straight line. For the sake of simplicity let’s consider A = ( 0 , 0 ) , from Theorem 3.14 we know that an extremal y must satisfy the Euler equation in the interval x [ x A x B ] . Note that x B is yet unknown. As we have already seen the extremal for minimum path problems are straight lines. Since we have assigned a boundary condition in A , we get:

y ( x ) = C 1 x

that are a family of straight lines passing through the origin. In order to find C 1 and x B we apply the necessary condition of Theorem 3.14 adpated to the one dimensional case, that is:

[ ( f f ) ϕ y f ϕ x ] y , x B = 0

Substituting f = 1 + C 1 2 , f = C 1 1 + C 1 2 , ϕ y = 1 , ϕ x = a we get:

1 + C 1 2 C 1 2 1 + C 1 2 + a C 1 1 + C 1 2 = 0

and upon simplification:

1 + a C 1 1 + C 1 2 = 0 C 1 = 1 a

that precisely means that the extremal curve y = 1 a x is perpendicular to the straight line defined by the constraint. In this simple case, this is a global condition but the transversality condition in general is a local condition at the end-point between the ”gradient” of the constraint and the extremal trajectory. Note that we have also implicitly found x B as the intersection between the lines y = 1 a x and y = a x + b that is x B = b a a 2 + 1 . A graphical representation of this problem is shown in Figure 3.13

pict

Figure 3.13:: Straight lines originating from the origin (i.e. point A ) in blue, optimal trajectory y in red, zero-level set of ϕ in green. Optimal point B = ( y ( x B ) , x B ) as black dot. Simple case with a = 1 and b = 2 .

3.5.2 Problems with isoperimetric constraints

An isoperimetric problem of the calculus of variations is a problem wherein one or more constraints involves a functional in the form of an integral over part or all of the integration horizon [ t 1 , t 2 ] . Typically:

min x ( t ) 𝒟 F [ x ] = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t  subject to:  G [ x ] = t 1 t 2 ψ ( t , x ( t ) , x ˙ ( t ) ) d t = K

where K is a given number. Note that this problem already fits the framework that we have treated so far since the constraint is in the form of a level-set of the functional G [ x ] . We will use the hybrid approach in Remark 3.23 to deal both with simpler constraints imposed by the function space 𝒟 (e.g. fixed end-points) and with the isoperimetric constraint G [ x ] = K .

Theorem 3.15: First-Order Necessary Conditions for Isoperimetric Problems

Consider the problem to minimize the functional:

F [ x ] : = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) d t

on 𝒟 : = { x 𝒞 1 [ t 1 , t 2 ] n x : x ( t 1 ) = x 1 , x ( t 2 ) = x 2 } , subject to the isoperimetric constraints:

G i [ x ] : = t 1 t 2 ψ i ( t , x ( t ) , x ˙ ( t ) ) d t = K i , i = 1 , , n g

with f 𝒞 1 ( [ t 1 , t 2 ] × 2 × n x ) and ψ i 𝒞 1 ( [ t 1 , t 2 ] × 2 × n x ) , i = 1 , , n g . Suppose that x 𝒟 gives a (local) minimum for this problem, and

| δ G 1 , x [ ξ ¯ 1 ] δ G 1 , x [ ξ ¯ n g ] δ G n g , x [ ξ ¯ 1 ] δ G n g , x [ ξ ¯ n g ] | 0

for n g (independent) directions ξ ¯ 1 , , ξ ¯ n g 𝒞 1 [ t 1 , t 2 ] n x . Then, there exists a constant vector λ n g such that x is a solution to the so called Euler-Lagrange’s equation:

d d t i ( t , x ( t ) , x ˙ ( t ) , λ ) = x i ( t , x ( t ) , x ˙ ( t ) , λ ) , i = 1 , , n x

where:

( t , x ( t ) , x ˙ ( t ) , λ ) : = f ( t , x ( t ) , x ˙ ( t ) ) + λ ψ ( t , x ( t ) , x ˙ ( t ) ) = = f ( t , x ( t ) , x ˙ ( t ) ) + i = 1 n g λ i ψ i ( t , x ( t ) , x ˙ ( t ) ) .

Proof. Remark first that, from the differentiability assumptions on f and ψ i , i = 1 , , n g the Gâteaux derivatives δ F x [ ξ ] and δ G i , x [ ξ ] , i = 1 , , n g , exist and are continuous for every ξ 𝒞 1 [ t 1 , t 2 ] n x .

Since x 𝒟 gives a (local) minimum for F on 𝒟 constrained to Γ ( K ) : = { x 𝒞 1 [ t 1 , t 2 ] n x : 𝒢 i ( x ) = K i , i = 1 , , n g } , and x is a regular point for the constraints, by Theorem 3.13 (and Remark 3.23 ), there exists a constant vector λ n g such that:

δ F x [ ξ ] + i = 1 n g λ i δ G i , x [ ξ ] = 0

for each 𝒟 -admissible direction ξ . Observe that this latter condition is equivalent to that of finding a minimizer to the functional:

F [ x ] : = t 1 t 2 f ( t , x ( t ) , x ˙ ( t ) ) + λ ψ ( t , x ( t ) , x ˙ ( t ) ) d t : = t 1 t 2 ( t , x ( t ) , x ˙ ( t ) , λ ) d t

on 𝒟 . The conclusion then directly follows upon applying Theorem 3.3. □

Remark 3.26: First Integrals

Similar to free problems of the calculus of variations (see Remark 3.11) it is easy to show that the Hamiltonian function H ( x ( t ) , x ˙ ( t ) , λ ) defined as

H : = x ˙ x ˙

is constant along an extremal trajectory provided that does not depend on the independent variable t . Recall the definition = f ( x ( t ) , x ˙ ( t ) ) + λ ψ ( x ( t ) , x ˙ ( t ) ) and that along an extremal trajectory the necessary condition of Theorem 3.15 holds, that is:

d d t x ˙ x = 0 .

The total time derivative of the Hamiltonian is:

d H d t = x x ˙ + x ˙ x ¨ d d t x ˙ x ˙ x ˙ x ¨ = 0

Example 3.16 (Solution to Dido’s Problem). We are finally ready to face problem shown in Example 3.3 . Dido’s Problem is to minimize the functional:

max y 𝒟 F [ y ] = x A x B y d x 8

subject to the isoperimetric 9 constraint:

G [ y ] = x A x B d s = x A x B 1 + 2 d x = L

The subspace 𝒟 is 𝒟 = { y 𝒞 1 [ x A , x B ] such that y ( x A ) = y A y ( x B ) = y B } . Now we build the function as:

= y + λ 1 + 2

Since is independent of x , the hamiltonian function is constant along an extremal trajectory, that is:

H = = C 1

in this case we have = λ 1 + 2 , hence:

y + λ 1 + 2 λ 2 1 + 2 = C 1 .

Manipulating the previous relation we get:

( y + λ 1 + 2 ) 1 + 2 λ 2 = C 1 1 + 2 ( C 1 y ) 1 + 2 = λ

that is a separable nonlinear differential equation, upon squaring both terms and after some algebraic manipulation we get:

= λ 2 ( C 1 y ) 2 ( C 1 y ) 2

and by formally expressing = d y d x and integrating both sides we have:

y A y C 1 y λ 2 ( C 1 y ) 2 d y = x B x d x .

In order to solve the integral in d y we make the substitution y = C 1 λ c o s ( 𝜃 ) , d y = λ s i n ( 𝜃 ) d 𝜃 and hence:

𝜃 A 𝜃 λ c o s ( 𝜃 ) λ 2 λ 2 c o s 2 ( 𝜃 ) λ s i n ( 𝜃 ) d 𝜃 = 𝜃 A 𝜃 λ c o s ( 𝜃 ) d 𝜃

in a similar way as for the brachistochrone problem we get the extremal trajectory in parametric form as:

y ( 𝜃 ) = C 1 + λ ^ c o s ( 𝜃 ) x ( 𝜃 ) = C 2 + l a m b d a ^ s i n ( 𝜃 )

where we have defined λ ^ = λ . Note that ( x , y ) define a circular arc of radius λ ^ and centered at ( C 2 , C 1 ) . This is easy to see considering that:

y C 1 = λ ^ c o s ( 𝜃 ) x C 2 = λ ^ s i n ( 𝜃 )

and thus:

( y C 1 ) 2 + ( x C 2 ) 2 = λ ^ 2

however we still need to compute the actual value of the radius λ ^ and the center coordinates ( C 2 , C 1 ) . Assume for the sake of simplicity that x A = 1 , x B = 1 and y A = y B = 0 . Substituing the boundary condtions in the previous equation gives:

( C 1 ) 2 + ( 1 C 2 ) 2 = λ ^ 2 ( C 1 ) 2 + ( 1 C 2 ) 2 = λ ^ 2

by subtracting these two conditions we have:

( 1 C 2 ) 2 ( 1 C 2 ) 2 2 C 2 = 0 C 2 = 0

therefore the center of this family of circles is in ( 0 , C 1 ) and passes through A = ( 1 , 0 ) and B = ( 1 , 0 ) . We still need to compute the radius λ ^ and the ordinate of the center C 1 . Indeed these two parameters depend on the isoperimetric constraint (i.e. the length of the circle) L that is given. The length of an arc of circle of radius λ ^ between 𝜃 A and 𝜃 B (yet unknown) is L = λ ^ ( 𝜃 B 𝜃 A ) , this comes from the definition of radiants, obviously the same relation holds if we compute it analytically as:

G [ y ( 𝜃 ) , x ( 𝜃 ) ] = 𝜃 A 𝜃 B x 𝜃 2 + y 𝜃 2 d 𝜃 = λ ( 𝜃 B 𝜃 A ) = L

𝜃 A and 𝜃 B can expressed in terms of C 1 by using the boundary condition at B and A :

x ( 𝜃 B ) y ( 𝜃 B ) C 1 = 1 C 1 = λ ^ s i n ( 𝜃 B ) λ ^ c o s ( 𝜃 B ) = t a n ( 𝜃 B ) x ( 𝜃 A ) y ( 𝜃 A ) C 1 = 1 C 1 = λ ^ s i n ( 𝜃 A ) λ ^ c o s ( 𝜃 A ) = t a n ( 𝜃 A ) .

Since the arctangent function is an odd function we have 𝜃 A = 𝜃 B = a r c t a n ( 1 C 1 ) while from the cartesian expression of the circle with C 2 = 0 at point B we get:

C 1 2 + 1 = λ ^ 2 λ ^ = C 1 2 + 1

finally the expression for the length L presents the only unknown C 1 :

L = 2 C 1 2 + 1 a r c t a n ( 1 C 1 )

that, given L , can be solved numerically for C 1 that is the ordinate of the center of the circle, then using λ = C 1 2 + 1 we can retrieve the value of the Lagrange multiplier λ that, interestingly enough, in this case is also the radius of the circle changed of sign. L as a function of C 1 is plotted in Figure 3.14 . The extremal for the simple case C 1 = 1 that gives λ = 2 and 𝜃 B = π 4 is plotted in Figure 3.15 .

pict

Figure 3.14:: Plot of the function y = 2 C 1 2 + 1 a r c t a n ( 1 C 1 ) , note that we are not completely free in choosing the arc length L since as C 1 0 L π while C 1 L 2 that is 2 < L < π for the problem to be well-posed.

pict

Figure 3.15:: Extremal trajectory in Red from point A to B, geometric construction in Blue.

Example 3.17 (Problem of the Surface of Revolution of Minimum Area). Consider the problem to find the smooth curve y ( x ) 0 having a fixed length L > 0 , joining two given points A = ( x A , y A ) and B = ( x B , y B ) , and generating a surface of revolution around the x -axis of minimum area. In mathematical terms, the problem consists of finding a minimizer of the functional:

F [ y ] : = 2 π x A x B y ( x ) 1 + ( x ) 2 d x

on 𝒟 : = { x 𝒞 1 [ x A , x B ] : y ( x A ) = y A , y ( x B ) = y B } , subject to the isoperimetric: constraint

G [ y ] : = x A x B 1 + ( x ) 2 d x = μ .

Let us drop the coefficient 2 π in F and introduce the Lagrangian as

( y ( x ) , ( x ) , λ ) : = y ( x ) 1 + ( x ) 2 + λ 1 + ( x ) 2

since does not depend on the independent variable x we use the fact that H must be constant along an extremal trajectory ȳ ( x ) ,

H : = = C 1

for some constant C 1 . That is:

( ȳ ( x ) + λ ) 1 + ȳ ˙ ( x ) 2 ( ȳ ( x ) + λ ) ȳ ˙ 2 1 + ȳ ˙ ( x ) 2 = C 1

that can be rearranged as:

( ȳ ( x ) + λ ) ( 1 + ȳ ˙ ( x ) 2 ) ( ȳ ( x ) + λ ) ȳ ˙ 2 1 + ȳ ˙ ( x ) 2 = ( ȳ ( x ) + λ ) 1 + ȳ ˙ ( x ) 2 = C 1

that with some simplification gives:

( x ) C 1 2 ( λ + y ) 2 C 1 2 = d x

again we solve the integral by substituting y = C 1 c o s h ( 𝜃 ) λ and d y = C 1 s i n h ( 𝜃 ) d 𝜃 and thus:

y A y C 1 2 ( λ + y ) 2 C 1 2 d y = 𝜃 A 𝜃 C 1 2 ( C 1 c o s h ( 𝜃 ) 2 ) 2 C 1 2 C 1 s i n h ( 𝜃 ) d 𝜃

and by using the relation c o s h ( 𝜃 ) 2 s i n h ( 𝜃 ) 2 = 1 we obtain:

𝜃 A 𝜃 C 1 2 ( C 1 c o s h ( 𝜃 ) 2 ) 2 C 1 2 C 1 s i n h ( 𝜃 ) d 𝜃 = C 1 𝜃 + C 2

and assuming x A = 0 we have:

x = C 1 𝜃 + C 2 𝜃 = x C 2 C 1

finally substituing back in y = C 1 c o s h ( 𝜃 ) λ we obtain:

ȳ ( x ) = C 1 c o s h ( x C 2 C 1 ) λ .

The constants C 1 , C 2 and λ are to be found from the boundary conditions and the isoperimetric constraints. Note that the extremal curve are a family of catenaries. For the sake of simplicity let’s do the the inverse reasoning that is we assign C 1 , C 2 , λ and we compute the boundary condititions. Note that C 2 is a translation along the x -axis while λ a translation along the y -axis.With C 1 = C 2 = 1 0 , λ = 5 . 4 and x A = 0 , x B = 2 0 we get the results of Figure 3.16 and 3.17 where the length L of the curve is:

L = G [ ȳ ] = x A x B 1 + s i n h ( x C 2 C 1 ) 2 d x = 2 3 . 5

note that the length L does not depend directly on the lagrange multiplier λ . While the surface area is:

A = F [ ȳ ] = 2 π x A x B ȳ ( x ) 1 + ȳ ˙ ( x ) 2 d x = 9 7 0 . 2 6

pict

Figure 3.16:: Generating curve ȳ

pict

Figure 3.17:: Generated surface of revolution of minimum area

Example 3.18 (Solution to hanging rope with fixed length). We give the solution to the problem shown in Example 3.2 . Given a rope of length L attached at two poles A = ( x A , y A ) and B = ( x B , y B ) find the function y ( x ) that describes the shape assumed by the rope under the action of gravity. We want to minimize the functional expressing the potential energy under the isoperimetric constraint expressing the fixed length. That is:

F [ y ] : = x A x B y ( x ) 1 + ( x ) 2 d x

on 𝒟 : = { x 𝒞 1 [ x A , x B ] : y ( x A ) = y A , y ( x B ) = y B } , subject to the isoperimetric constraint

G [ y ] : = x A x B 1 + ( x ) 2 d x = μ

This is exactly the same problem as the surface of minimum area. The rope assumes the shape of a catenary:

y ( x ) = C 1 c o s h ( x C 2 C 1 ) λ

where again C 1 , C 2 , λ are to be found from the boundary condition and the length L .

3.5.3 Problems with inequality constraints

The method of Lagrange multipliers can also be used to address problems of the calculus of variations having inequality constraints (or mixed equality and inequality constraints), as shown by the following:

Theorem 3.16: Existence and Uniqueness of the Lagrange Multipliers (Inequality Constraints)

Let F and G 1 , , G n g be functionals defined in a neighborhood of x in a normed linear space ( 𝒳 , ) , and having continuous and linear Gâteaux derivatives in this neighborhood. Suppose that x is a (local) minimum point for F constrained to Γ ( K ) : = { x 𝒳 : 𝒢 i ( x ) K i , i = 1 , , n g } for some constant vector K . Suppose further that n a n g constraints, say G 1 , , G n a for simplicity, are active at x and satisfy the regularity condition

| δ G 1 , x [ ξ ¯ 1 ] δ G 1 , x [ ξ ¯ n a ] δ G n a , x [ ξ ¯ 1 ] δ G n a , x [ ξ ¯ n a ] | 0

for n a (independent) directions ξ ¯ 1 , , ξ ¯ n a 𝒳 (the remaining constraints being inactive). Then, there exists a vector ν n g such that:

δ F x [ ξ ¯ ] + i = 1 n g δ G i , x [ ξ ¯ ] ν i = 0 ξ 𝒳

and

ν i 0 ( G i [ x ] K i ) ν i = 0

for i = 1 , , n g

Proof. Since the inequality constraints G n a + 1 , , G n g are inactive, the nonnegativity conditions and complementary slackness are trivially satisfied by taking ν n a + 1 = = ν n g = 0 . On the other hand, since the inequality constraints G 1 , , G n a are active and satisfy a regularity condition at x , the conclusion that there exists a unique vector λ n a such that the ”stationarity condition” holds follows from Theorem 3.12 moreover, the complementarity slackness condition is trivially satisfied for i = 1 , n a since G i [ x ] = K i if i is active. Hence, it suffices to prove that the Lagrange multipliers ν i ν n a cannot assume negative values when x is a (local) minimum.

We show the result by contradiction. Without loss of generality, suppose that ν 1 < 0 , and consider the ( n a + 1 ) × n a matrix A defined by

A = [ δ F x [ ξ ¯ 1 ] δ F x [ ξ ¯ n a ] δ G 1 , x [ ξ ¯ 1 ] δ G 1 , x [ ξ ¯ n a ] δ G n a , x [ ξ ¯ 1 ] δ G n a , x [ ξ ¯ n a ] ] .

By hypothesis, Rank ( A ) n a 1 , hence the null space of A has dimension lower than or equal to 1 . But from the stationarity condition the nonzero vector ( 1 , ν 1 , , ν n a ) Ker ( A ) . That is Ker ( A ) has dimension equal to 1 and A y < 0 only if η such that y = η ( 1 , ν 1 , , ν n a ) . Because ν 1 < 0 there does not exist a y 0 in Ker ( A ) such that y i 0 for every i = 1 , , n a + 1 . Thus, by Gordan’s Theorem there exists a nonzero vector p 0 in n a such that A p < 0 , or equivalently:

k = 1 n a p k δ F x [ ξ ¯ k ] < 0 k = 1 n a p k δ G i , x [ ξ ¯ k ] < 0 i = 1 , , n a .

The Gâteaux derivatives of F , G 1 , , G n a being linear (by assumption), we get:

δ F x [ k = 1 n a p k ξ ¯ k ] < 0 δ G i , x [ k = 1 n a p k ξ ¯ k ] < 0 , i = 1 , , n a .

In particular,

δ > 0  such that  x + η k = 1 n a p k ξ ¯ k Γ ( K ) , 0 η δ

that is, x being a local minimum of F on Γ ( K ) ,

δ ( 0 , δ ]  such that  F [ x + η k = 1 n a p k ξ ¯ k ] F [ x ] 0 η δ

and we get

δ F x [ k = 1 n a p k ξ ¯ k ] = lim η 0 + F [ x + η k = 1 n a p k ξ ¯ k ] F [ x ] η 0

which contradicts the inequality obtained earlier. □